 # Data Science (Machine Learning)

This course teaches how to use Python for Data Science and Machine Learning. It takes you through the life cycle of Data Science project using tools and libraries in Python.

 Prerequisite Python Language Theory Fee Rs. 6500/- (Includes digital course material) Digital or Physical Certificate Fee Rs. 200/- Software Required Anaconda from anaconda.com/downloads

## Detailed Syllabus

### Introduction to Data Science

• What is Data Science
• What is Machine Learning
• What is Deep Learning
• Role of Data Scientist
• Applications of Data Science
• Data and its sources
• Overview of Data Science Life Cycle

### Working with Anaconda and Jupyter Notebook

• Starting Jupyter Notebook
• UI elements of Notebook
• Kernel and types of cells - Code and Markdown
• Modes - Edit and Command
• Magic functions - Line and Cell functions
• Keyboard shortcuts - Command mode and Edit mode shortcuts
• Using Jupyter Lab

### Basic Statistics

• Mean, Median, Mode and Range
• Using statistics module
• Variance and Standard Deviation
• Quartiles and IQR
• Understanding distribution of data using Histogram and Box plot
• Measuring Skewness and Kurtosis
• Probability
• Correlation between variables
• Using scipy.stats module
• Scatter plot to understand correlation
• Regression Analysis
• Understanding intercept and slope - predict y given X

### Numpy

• Creating single and multi-dimensional arrays
• Using indexing and slicing
• Using fancy indexing (boolean indexing and array of indices)
• Array operations, methods of ndarray and universal functions
• View vs. Copy of array
• Reshaping arrays
• Stacking and splitting arrays
• Applying Linear Algebra
• Image processing with Arrays

### Pandas

• Working with Series
• Applying methods on Series
• Working with DataFrame
• Reading data into DataFrame and writing DataFrame to other formats
• Selecting rows and columns in DataFrame
• Adding and deleting rows and columns in DataFrame
• Working with apply() and applymap() functions
• Working with str attribute for string manipulations
• Joining, Merging and Concatenating DataFrames
• Grouping data on one or more columns
• Using pivot_table()
• Data Wrangling - Binning, Encoding etc.
• Handling null values
• Drawing plots using Pandas

### Matplotlib

• Anatomy of a figure
• Working with Module API and Object API
• Working with different plots - Histogram, Bar, Stacked Bar, Pie, Scatter, Line
• Creating multiple axes in single figure
• Customizing plots - labels, legends, scales, titles, text etc.

### Seaborn

• Figure-level vs. Axes level plots
• Categorical, Relational, Distribution, Regression and Matrix Plots
• Using parameters like hue, row and col

### Data Science Workflow (Life Cycle) using Classification Case Study

• What is the question
• Data Acquisition
• Preparing data - cleaning and organizing data
• Exploratory Data Analysis (EDA)
• Data Munging/Data Wrangling
• Feature Engineering
• Data Visualization
• Model Building
• Model Evaluation
• Model Deployment

### Machine Learning Workflow with Classification Case Study

• Understanding pre-processing concepts like Standardization, Encoding etc.
• Training Model using train and test split
• Using different algorithms like Logistic Regression, Support Vector Machines, k-Nearest Neighbors, Naive Bayes, Decision Tree, Random Forest using Scikit-learn
• Evaluating result of the model using metrics - classification report, confusion matrix
• Understand Precision, Recall, F1 Score, Specificity and Sensitivity
• Understanding cross validation and how to use it to train and test model
• Presenting the model - Deployment of the model

### Working with Regression case study

• How to use metrics - MSE, RMSE, R2 Score etc. to evaluate model
• Understanding Regularization - Lasso and Ridge
• Understanding ensemble algorithms - Bagging and Boosting