Prerequisites
Programming
-
Introduction to Computer Science and Programming Using Python (MITx)
This course is good for Python beginners. It deals with Python basics, rudimentary algorithms with Python, and algorithmic complexity.
My repository: Intro2CSandPy Click!
-
Introduction to Computational Thinking and Data Science (MITx)
This is the advanced course of above one. It handles graph theory, and basic data science, machine learning, and statistics with Python rather than Python itself.
My repository: Intro2CTandDS Click!
-
Introduction to R for Data Science
-
PyTorch
-
TensorFlow
-
Numpy
-
Pandas
-
Jupyter Notebook
Database
-
Database Infrastructure Fundamentals (Microsoft)
Database knowledge is also necessary in data science. This course addresses not only its syntax but also its infrastructure fandamentals.
My repository: DBINfraFunda
-
SQL for Data Analyst (Udacity)
-
Database MOOCs (Stanford)
-
SQL Tutorial (W3Schools)
Mathematics
Have a quick glance at Curriculum in MATH is FUN to be familiar with mathematical terminology and expression in English.
-
Calculus
-
Linear Algebra
-
Discrete Mathematics
-
Probability and Statistics
Machine Learning
-
Statistical Learning
-
Deep Learning
Deep Learning from Scratch
References: Dugam’s answer in Quora, 8 Skills You Need to Be a Data Scientist, William Chen’s answer in Quora
I had learned computer science (CS) at undergraduate level and pursue further studies in data science and machine learning. This post is for those who are in the same boat with me to review CS knowledge aligned with data science and machine learning. I refer to Q4 of FAQs for New Postgraduate Students in CSE HKUST that states core requirements for all postgraduate students in CS and share related MOOCs and their solutions by me.
Operating Systems
-
Introduction to Operating Systems (Udacity)
This requires students with C knowledge. Should you want to see C first, check this page: Practical Programming in C (MIT OCW)
My repository: Intro2OS
Design and Analysis of Algorithms
This is the most important part. Please read them carefully; Algorithms and Data Structures and Graph Algorithms.
-
Introduction to Computer Science and Programming Using Python (MITx)
This course is good for Python beginners. It deals with Python basics, rudimentary algorithms with Python, and algorithmic complexity.
My repository: Intro2CSandPy Click!
-
Introduction to Computational Thinking and Data Science (MITx)
This is the advanced course of above one. It handles graph theory, and basic data science, machine learning, and statistics with Python rather than Python itself.
My repository: Intro2CTandDS Click!
-
Algorithms and Data Structures (Microsoft)
My repository: AlgosNDS
-
Graph Algorithms (UCSanDiegoX)
My repository: GraphAlgos
-
Introduction to Algorithms (MIT)
My repository: Intro2Algos
Theory of Computation
-
Automata, Computability, and Complexity (MIT OCW)
My repository: ACC
-
Theory of Computation (MIT OCW)
My repository: TheoryofComp
Others
-
C
My repository:
-
C++
My repository: Intro2Cpp
-
Java
My repository: Intro2Java
Please feel free to share your any thoughts about this article and my solusions. This will be very helpful to me to improve myself.
I also recommend this: List of Awesome University Courses for Learning Computer Science
GitHub Repository Link
This repository contains general interview preparation materials mostly provided by Udacity for data scientist positoion and is written in Python.
Computer Science Fundamentals and Programming Topics
This section is useful to pass technical interviews for IT giant companies and to work for big data computing positions.
- Data Structures
- Lists
- Stacks
- Queues
- Strings
- Hash Maps
- Vectors
- Matrices
- Classes & Objects
- Trees
- Graphs
- Algorithms
- Recursion
- Searching
- Sorting
- Optimization
- Dynamic Programming
- Computability and Complexity
- P vs. NP
- NP-Complete Problems
- Big-O Notation
- Approximate Algorithms
- Computer Architecture
- Memory
- Cache
- Bandwidth
- Threads & Processes
- Deadlocks
Probability and Statistics Topics
These topics are important to understand machine learning theories.
- Basic Probability
- Conditional Probability
- Bayes Rule
- Likelihood
- Independence
- Probabilistic Models
- Bayes Nets
- Markov Decision Processes
- Hidden Markov Models
- Statistical Measures
- Mean
- Median
- Mode
- Variance
- Population Parameters VS. Sample Statistics
- Procimity and Error Metrics
- Cosine Similarity
- Mean-Squared Error
- Manhattan and Eulidean Distance
- Log-Loss
- Distributions and Random Sampling
- Uniform
- Normal
- Binomial
- Poisson
- Analysis Methods
- ANOVA
- Hypothesis Testing
- Factor Analysis
Data Modeling and Evaluation Topics
- Data Preprocessing
- Munging/Wrangling
- Transforming
- Aggregating
- Pattern Recognition
- Correlations
- Clusters
- Trends
- Outliers & Anomalies
- Dimensionality Reduction
- Eigenvectors
- Principal Component Analysis
- Prediction
- Classification
- Regression
- Sequence Prediction
- Evaluation
- Training Testing Split
- Cross Validation
Applying Machine Learning Algorithms and Libraries Topics
- Models
- Parametric VS. Nonparametric
- Decision Tree
- Nearest Neighbor
- Neural Net
- Support Vector Machine
- Ensemble of Multiple Models
- Learning Procedure
- Linear Regression
- Gradient Descent
- Genetic Algorithms
- Bagging
- Boosting
- Regularization
- Hyperparameter Tuning
- Tradeoffs and gotchas
- Bias and Variance
- Overfitting and Underfitting
- Vanishing/Exploding Gradients
Software Engineering and System Design Topics
- Software Interface
- Database
- User Interface
- Data Visualisation
- Scalability
- Map-reduce
- Distributed Processing
- Deployment
- Cloud Hosting
References: What is Data Science? 8 Skills That Will Get You Hired in Data