DATA SCIENCE Virtual Hands-On Training

About Training & Trainer

Training Title

Predicting Business Performance through Data Science – A State-of-the-Art Hands-on Training with Roadmap for Successful Business Implementation

Training Duration

5 weeks with 2 classes/week (4 hours each) on Friday and Saturday.

Executive Summary

Data Science is the process of extracting useful business insights from complicated business data. These insights are in the form of predictions, e.g., predicting the churning customers, customers-to-market, empty ATM machines, overstocked inventories, and sales. The data could be of many different types and stored in several databases. The data is also typically dirty, i.e., it is not clean for any analytical activity. Using such data for predictions is a complicated, lengthy, and expensive task.

This training will take the participants on an exotic journey: they will learn the core art of data science through comprehensive hands-on activities, and at the same time, they will acquire knowledge on how to ensure success in a business scenario. For this, we will provide a complete roadmap based on state-of-the-art technologies and our previous corporate knowledge.

Major Take-Aways

Core knowledge of data science and standard implementation processes
Hands-on knowledge of all state-of-the-art data science algorithms (classification, regression, customer segmentation)
How do I determine the best predictive result for a given problem?
If I deploy the predictive model in operations, how will it maintain its high performance even in the face of new customer data?
What is needed to launch a data science initiative in my company?
What are the mistakes which can be done by the data scientists?
What are the mistakes which can be done from the business side?
How do I ensure a successful data science initiative which is cost-effective as well as performance-effective?

Training Methodology
The training will be conducted completely online (on Zoom or other related tools). It involves a large set of topics related to data science. A large majority of these topics are associated with hands-on activities. For hands-on sessions, we will use Python language in the Jupyter Lab environment, which are standard hands-on tools for teaching data science at an international level. Participants will be asked to bring their own laptops and help will be provided to them for the installation of Python and Jupyter Lab before the training starts.

For hands-on activities, the instructor will initially demo the data science programming code, and for each topic, the participants will be asked to complete some practice exercises in class, as well as complete practice take-home assignments. This will ensure the development of data science skills, along with knowledge of expected outputs and methods of improving the predictions. Hands-on sessions (demo and practice combined) will comprise at least 85% of all participants’ activities in the training. For the remaining 15%, at least 10% effort would be also practical albeit not programming related; the participants will be asked to do practical work of designing data governance initiatives for their organizations (while teaching data governance theory) and the last 5% will be purely theoretical and related to data science theoretical concepts.

All programming code for demo and practice questions for hands-on (both in-class and takehome) will be provided to participants along with course contents at the beginning of the class. Hand-outs of important theoretical material will also be provided before each class. The practice sessions will be checked on a weekly basis and feedback will be provided to the participants. A WhatsApp group of the participants will be created to enhance interaction and acquire their feedback and address their concerns on a rolling basis.

Commercial Impact of Training

The benefits, i.e., the commercial impact, which will be achieved by the participants of this training are as follows:

Getting an education in AI is challenging and requires persistence and personal initiative. The shortage of AI skills is seen as a major barrier to the pace of technology’s adoption. In fact, a recent poll confirmed that 56% of senior AI professionals believed that a lack of additional, qualified AI workers was the single biggest hurdle to be overcome in terms of achieving the necessary level of AI implementation across business operations. By closing the AI Skill gap, we are offering a
competitive edge to participant SMEs.
As overall industry trends are changing towards AI and Machine Learning. By having certified resources in ML, the chances of finding new business opportunities in terms of sales and new product/project development is high.
It can be tough to recruit new technology workers in a tight labor market. Through ML training, SME can boost the number of internal workers with data science skills.
Between 2012 and 2017, the number of data scientist jobs on LinkedIn increased by more than 650 percent (KDnuggets). Resource outsourcing and job creation will be increased once SME has highly skilled resources within both local and international markets.
C-level executives will be able to start data governance initiatives in their organizations to lay a strong foundation and framework for regulatory compliance, data security, and data analytics, e.g., to comply with GDPR regulations.
Every data-related employee in the organization will be able to clearly understand his/her position in their data-centric organization as well as the roles and responsibilities of other employees
A robust data-centric SME will be created based on data governance policies, leading to effective business predictions, increased customer satisfaction, and enhanced ROI.

Trainer:

Dr. Tariq Mahmood

Dr. Tariq Mahmood is the Chief Data Scientist and Analytics Project Director with the Frontier Technology Institute (www.frontiertechnologyinstitute.com). Previously, he has also served as the Project Director with Codex (www.codexnow.com), Nexdegree (www.nexdegree.com), and Vectracom (www.vectracom.com). In these companies, he has served as the Analytics Business Development Manager, Chief Data Scientist, and Analytics Project Director. Dr. Tariq is also a Professor at the prestigious Institute of Business Administration (IBA), Karachi. He has 15 years of professional and research experience in the domains of Business Intelligence, Data Warehousing, Data Science, Machine Learning, Big Data Analytics, and Advanced Analytics. Dr. Tariq also designed and supervises the Diploma for Big Data Analytics offered by IBA Karachi, a rigorous, hands-on course focusing on big data analytics in the real sense with Apache Hadoop ecosystem, containerization, Apache Spark and NoSQL databases along with lambda, kappa and zeta architectures (https://cict.iba.edu.pk/BigDataAnalytics.php). Dr. Tariq has designed such for the healthcare and telecommunication sectors. He has conducted numerous corporate and academic training and workshops on data science and big data analytics, both for government and private organizations. He also heads the Big Data Analytics Laboratory (BDA-LAB) at IBA Karachi.

Course Outline

Lecture 1: Foundation Class

What does it mean? Evolution of the word “Data Science” (from the 1990s till 2020) and the standard Data Science definition
What is inside it? – The Business and Mathematical components of Data Science
Clearing the confusion: Difference between Machine Learning, Predictive Analytics and Data Science
How does it work? The standard Data Science process of Microsoft
Corporate application domains of Data Science
Data Science Success Stories

Lecture 2: EDA – Part I (Hands-on Activities)

Why does data get dirty? – IT department, database design, spaghetti, lack of data governance, business evolution, mergers, acquisitions, cost-cuts, global village landscape, hardware limitations, human errors
Hands-on: What are the types of dirty data? – missing values, incorrect values, incomplete values, outliers, noise, duplicated data, unnecessary data, outdated data, new data
Hands-on: Practical examples of dirty data
Hands-on: How can data governance and data assessment solve the dirty data and other data-related issues? Developing and implementing standard rules related for data collection and ingestion, metadata and master data management, data storage (NoSQL or relational or both?), querying, analysis and visualization

Lecture 3: EDA – Part II (Hands-on Activities)

Hands-on: Detecting and dealing with missing values
Hands-on: Detecting and dealing with incorrect and incomplete data
Hands-on: Detecting and dealing with outliers and noise
Hands-on: Detecting and dealing with duplicated, outdated and useless data
Hands-on: Understanding categorical data through frequency bar graphs and correlation through chi-squared tests
Hands-on: Understanding numerical data through boxplots, histograms and correlation through Pearson tests, t-test and ANOVA
Hands-on: Preparing the EDA Report – FTIs EDA Template

Lecture 4: Laying the Practical Foundation of Data Science (Hands-on Activities)

Hands-on: Definition and data set transformation for a classification task
Hands-on: Determining the classification Performance metrics
Hands-on: Understanding Feature Engineering and its importance to Data Science
Hands-on: Understanding Cross-Validation and its importance to Data Science
Difference between Ensemble and Traditional Algorithms
Hands-on: FTI’s Machine Learning Methodology
Hands-on: Datasets: Directed Marketing, Customer Churn, Predictive Maintenance

Lecture 5: Supervised Learning with Classification – Part I (Hands-on Activities)

Hands-on: K-NN Algorithm
Hands-on: Naive Bayes Algorithm
Hands-on: SVM Algorithm
Hands-on: Decision Tree Algorithm

Lecture 6: Supervised Learning with Classification – Part II (Hands-on Activities)

Hands-on: Logistic Regression Algorithm
Hands-on: Random Forest Algorithm
Hands-on: XGBoost Algorithm
Hands-on: LightGBM Algorithm

Lecture 7: Classification Performance Comparison (Hands-on Activities)

Which algorithm works best?
Hands-on: Which algorithm works best under which situation?
Hands-on: Effects of Feature Engineering, Algorithm Selection, and Data Splitting method on predictive performance
Gold Nugget: How can I obtain the best predictive performance for my problem?

Lecture 8: Supervised Learning with Regression (Hands-on Activities)

Definition and data set transformation for a regression task
Hands-on: Determining the regression Performance metrics
Hands-on: Multiple Linear Regression
Hands-on: Random Forest Regression
Gold Nugget: How can I obtain the best predictive performance for my problem?

Lecture 9: Customer Segmentation with Unsupervised Learning (Hands-on Activities)

Definition and techniques
Hands-on: Performance metrics of Clustering
Hands-on: Customer Segmentation with K-means algorithm
Hands-on: Customer Segmentation with Mean shift algorithm
Hands-on: Difference between the clustering outputs and selecting the best clusters (segments)

Lecture 10: How to Ensure Successful a Data Science Initiative – A Roadmap

I don’t have data governance but still need it for data cleaning. What is the fast way of doing it?
Do we hire a data science team or build an in-house skill set?
Do we use Cloud or not?
How to determine expectations of the business side and how to meet these expectations?
What outputs to expect from the data science team and how to test their outputs?
How to deploy the model live?
How to ensure that the model’s accuracy will not decrease below a certain expected threshold?