Programming

  • Introduction to Computer Science and Programming Using Python (MITx)

    This course is good for Python beginners. It deals with Python basics, rudimentary algorithms with Python, and algorithmic complexity.

    My repository: Intro2CSandPy Click!

  • Introduction to Computational Thinking and Data Science (MITx)

    This is the advanced course of above one. It handles graph theory, and basic data science, machine learning, and statistics with Python rather than Python itself.

    My repository: Intro2CTandDS Click!

  • Introduction to R for Data Science

  • PyTorch

  • TensorFlow

  • Numpy

  • Pandas

  • Jupyter Notebook

Database

Mathematics

Have a quick glance at Curriculum in MATH is FUN to be familiar with mathematical terminology and expression in English.

Machine Learning

References: Dugam’s answer in Quora, 8 Skills You Need to Be a Data Scientist, William Chen’s answer in Quora

I had learned computer science (CS) at undergraduate level and pursue further studies in data science and machine learning. This post is for those who are in the same boat with me to review CS knowledge aligned with data science and machine learning. I refer to Q4 of FAQs for New Postgraduate Students in CSE HKUST that states core requirements for all postgraduate students in CS and share related MOOCs and their solutions by me.

Operating Systems

Design and Analysis of Algorithms

This is the most important part. Please read them carefully; Algorithms and Data Structures and Graph Algorithms.

  • Introduction to Computer Science and Programming Using Python (MITx)

    This course is good for Python beginners. It deals with Python basics, rudimentary algorithms with Python, and algorithmic complexity.

    My repository: Intro2CSandPy Click!

  • Introduction to Computational Thinking and Data Science (MITx)

    This is the advanced course of above one. It handles graph theory, and basic data science, machine learning, and statistics with Python rather than Python itself.

    My repository: Intro2CTandDS Click!

  • Algorithms and Data Structures (Microsoft)

    My repository: AlgosNDS

  • Graph Algorithms (UCSanDiegoX)

    My repository: GraphAlgos

  • Introduction to Algorithms (MIT)

    My repository: Intro2Algos

Theory of Computation

  • Automata, Computability, and Complexity (MIT OCW)

    My repository: ACC

  • Theory of Computation (MIT OCW)

    My repository: TheoryofComp

Others

Please feel free to share your any thoughts about this article and my solusions. This will be very helpful to me to improve myself.

I also recommend this: List of Awesome University Courses for Learning Computer Science

GitHub Repository Link

This repository contains general interview preparation materials mostly provided by Udacity for data scientist positoion and is written in Python.

Image 1 to road to data scientist

Computer Science Fundamentals and Programming Topics

This section is useful to pass technical interviews for IT giant companies and to work for big data computing positions.

  • Data Structures
    • Lists
    • Stacks
    • Queues
    • Strings
    • Hash Maps
    • Vectors
    • Matrices
    • Classes & Objects
    • Trees
    • Graphs
  • Algorithms
    • Recursion
    • Searching
    • Sorting
    • Optimization
    • Dynamic Programming
  • Computability and Complexity
    • P vs. NP
    • NP-Complete Problems
    • Big-O Notation
    • Approximate Algorithms
  • Computer Architecture
    • Memory
    • Cache
    • Bandwidth
    • Threads & Processes
    • Deadlocks

Probability and Statistics Topics

These topics are important to understand machine learning theories.

  • Basic Probability
    • Conditional Probability
    • Bayes Rule
    • Likelihood
    • Independence
  • Probabilistic Models
    • Bayes Nets
    • Markov Decision Processes
    • Hidden Markov Models
  • Statistical Measures
    • Mean
    • Median
    • Mode
    • Variance
    • Population Parameters VS. Sample Statistics
  • Procimity and Error Metrics
    • Cosine Similarity
    • Mean-Squared Error
    • Manhattan and Eulidean Distance
    • Log-Loss
  • Distributions and Random Sampling
    • Uniform
    • Normal
    • Binomial
    • Poisson
  • Analysis Methods
    • ANOVA
    • Hypothesis Testing
    • Factor Analysis

Data Modeling and Evaluation Topics

  • Data Preprocessing
    • Munging/Wrangling
    • Transforming
    • Aggregating
  • Pattern Recognition
    • Correlations
    • Clusters
    • Trends
    • Outliers & Anomalies
  • Dimensionality Reduction
    • Eigenvectors
    • Principal Component Analysis
  • Prediction
    • Classification
    • Regression
    • Sequence Prediction
  • Evaluation
    • Training Testing Split
    • Cross Validation

Applying Machine Learning Algorithms and Libraries Topics

  • Models
    • Parametric VS. Nonparametric
    • Decision Tree
    • Nearest Neighbor
    • Neural Net
    • Support Vector Machine
    • Ensemble of Multiple Models
  • Learning Procedure
    • Linear Regression
    • Gradient Descent
    • Genetic Algorithms
    • Bagging
    • Boosting
    • Regularization
    • Hyperparameter Tuning
  • Tradeoffs and gotchas
    • Bias and Variance
    • Overfitting and Underfitting
    • Vanishing/Exploding Gradients

Software Engineering and System Design Topics

  • Software Interface
    • Database
  • User Interface
    • Data Visualisation
  • Scalability
    • Map-reduce
    • Distributed Processing
  • Deployment
    • Cloud Hosting

Table 1 to compare jobs related data

References: What is Data Science? 8 Skills That Will Get You Hired in Data