際際滷

際際滷Share a Scribd company logo
www.cict.iba.edu.pk
education providing professional diplomas, certifications, and workshops in the field of Information technology, Information
Systems, and Computer Science. The CICT was established in 2016 providing high-quality professional education to the private
and public sectors in Pakistan. The CICT has been associated with renowned faculty members who conduct certain courses and
workshops which contribute to the digitalization in the educational and professional sectors.
The Center for Information & Communication Technology (CICT) aspires to meet the highest standards of IT excellence
required in pursuit of management strategies.
The Center for Information & Communication Technology (CICT) is to provide excellent teaching and research environment
specially in Information Technology to produce students/professionals who distinguish themselves by their professional
competence, research, entrepreneurship, humanistic outlook, ethical rectitude, pragmatic approach to problem solving,
managerial skills and ability to respond to the challenge of socio-economic development to serve as the vanguard of techno-
industrial transformation of the society.
 Be respectful to your teachers as well as your classmates, any kind of disrespect or misbehavior will be subject to dismissal from the course.
 Be on time for your classes and avoid being irregular.
 All students are required to carry their ID Cards provided by the administration to avoid any inconveniences while entering the premise.
 In case of Card loss, the student is required to report to the department to state their case.
 Avoid usage of mobile devices during your sessions, the instructor has a full right to dismiss any student who has been found using the phone during
class.
 Do not bring any food or drinks inside your classrooms especially computer labs as it is strictly prohibited.
 Don't forget to turn off the machine once you are done using it.
 Do not plug in external devices without scanning them for computer viruses.
Note: IBA plays a vital role in maintaining decorum which applies to every candidate entering the premises. Keeping in mind that the campus is Smoke-
free any candidate carrying out any mal activity is reluctantly charged & may result is dismissal from the course.
The convergence of big data and machine learning with technologies such as cloud services, sensors, ubiquitous computing, mobile
devices, and the Internet of Things has created vast new opportunities for business. Analytics has become a competitive and
sustainable advantage for many organizations. To harness the benefits of big data and machine learning, however, business
leaders face the pressing challenge of not only acquiring the right technologies and talent to analyze and interpret the data but also
weaving a data-centric mindset into the organization's structure and cultural fabric.
This Four-Month Diploma will empower me with the skills and confidence to tackle data-driven opportunities and accelerate data-
analysis transformation in the organization. Through lectures, case studies, and discussions, real-world insights will be gained on
various applications of big data analytics and machine learning, and how they can be used to fuel better decision-making within the
context of the attendee's own Department/Organization.
Mr. Sohail Imran:
For more than 18 years, Mr. Sohail Imran is conducting
training and workshops for databases (SQL and
NoSQL), Big Data Infrastructure, and Machine
Learning for different institutes, universities, and the
corporate sector. More than 8 years of professional
experience in Big Data Analytics, Data Science, Data
Mining, Data Warehousing, and DBMS (SQL and
NoSQL). Providing consultancy in designing and
developing Big Data Analytics platforms using Java,
RapideMiner, Radoop, Python, Hadoop, Hive, Spark,
Kafka, Spark Streaming, Storm, etc.
Mr. Muhammad Rizwan:
Muhammad Rizwan provides digital leadership to
organizations, from strategy to execution, globally and
locally. In his 25+ corporate career, he has worked in both
public and private equity spaces with technology-led and
digitally-enabled businesses in various management
vernaculars, including C-Suite. He holds a certificate from
Stanford University in Machine Learning. A master's degree
from Hamdard. Bachelor's degree in Statistics/Commerce
and Information Systems. He also carries an international
diploma in Software Engineering. Currently, heading the
information systems at Dollar Industries (Pvt) Ltd. He has
worked for Hino Pak Motors, Karachi Stock Exchange, and
CPLC (Car theft software) projects.
Dr. Affan Alim:
Dr. Muhammad Affan Alim has 16 years of teaching, research, and development experience in
Machine Learning, Deep Learning, Data Science, pattern classification, computer vision,
Optimization of models, and statistical & mathematical analysis. He also has several years of
professional experience in software development in Pakistan and the United Kingdom (UK). He has
developed several industry-based projects.
Mr. Muhammad Shamim Ahmed:
With overall 10 years of experience in Project management, Oracle functional consultation,
and 4 years as an Oracle University trainer, Mr. Shamim has been a great asset to IBA
CICT. He has executed 5 projects to date in various capacities, with proven experience with
the complete life cycle of Oracle EBS implementation. He has been actively involved in all
stages of project management such as initiation, Business Blueprint (AS-IS Document),
GAP Analysis, Solution design, and much more. He is exceptionally motivated, energetic,
and enthusiastic about learning and teaching
 History and Evolution of Python
 Advantages of Apache Spark with Python in a Big Data Environment
 Setting up Big Data Programming Development Environments
 Programming Language Basics
 Collections and their types
 Conditional Control Structures
 Iterative Control Structures
 Methods with Practice Examples
 Module and File I/O
 Object-Oriented Programming Concepts
 Apache Spark and Python for Machine Learning
 Exam
I. Python Foundations for Big Data Analytics
 Intro to data science, Role of the database designer, data engineer, data analyst, and data scientist. what data scientists do.
 The available format of data and what types of data a data scientist received. The life cycle of data science, data science competitions
 The difference between wrangling and feature engineering, the steps of wrangling, and its detail
 Reading of different methods of the dataset using pandas python
Steps of feature engineering for machine learning. During understanding, some real examples will also be discussed. Requirement of the tools and techniques of data science.
 Missing data imputation using pandas. How to find the missing values in the dataset, what handling of missing values is important
 Fillna() method with different parameters for missing values.
 Drop all rows of missing data in Data Frame, and drop missing data rows with respect to a specific column.
 How to fill using aggregate values, how to fill forward and backward, and fill with reference to other columns. Practice questions
 The real application-based problem for missing values
 What is the outlier, what is the impact of an outlier in data, and how do find and visualize the outliers?
 Outlier removal strategies, Performing winsorization, a python implementation
II. Big Data Wrangling
 Three case studies for EDA
 Exploratory data analytics
 Handling the structuring issues in the dataset, inconsistencies in date, and any other attributes. Unnecessary character attachment, Exploring these issues
 Handling the structuring issues using regex
 Categorical Variables: Encoding Categorical Variables: on hot encoding, dummy encoding, effect encoding, pros and cons of a categorical variable encoding
 Discussion of project
 Live demonstration of Kaggle and its features
 how Kaggle will helpful for data scientist
 Exam
II. Big Data Wrangling
III. Business Intelligence (BI) and Big Data Visualization
 Introduction to BI & commonly using BI tools.
 Power BI introduction & its components.
 Power View, Query, Pivot & Power BI Service
 Introduction to Power Query and its usage.
 Basic Power BI Navigation.
Basic Power BI Charts.
 Column Chart.
 Stacked Column Chart.
 Pie Chart.
 Donut Chart.
 Funnel Chart.
 Ribbon Chart.
 Include and Exclude.
 Export data from Visual.
Maps in Power BI Desktop
 Map.
 Filled Map.
 Map with Pie Chart.
 Formatting in Map.
 Background Changes in Map.
 Map of Pakistan in Power BI.
 Map of Australia.
Table & Matrix in Power BI Desktop
 Creating a Simple Table.
 Formatting in Table.
 Conditional Formatting in Table.
 Changing Aggregation in Table.
 Creating a Matrix in Power BI.
 Conditional Formatting in Matrix.
 Automatic Hierarchy in Matrix.
Other Charts in Power BI Desktop
 Line Chart.
 Drill down in Line Chart.
 Area Chart.
 Line vs Column Chart.
 Scatter Plot.
 Waterfall Chart.
Cards and Filters in Power BI Desktop.
 Number Card.
 Text Card.
 Date Card.
 Multi-Row Card.
 Filter on Visual.
 Filter on Page.
 Filter on All Pages.
 Drill through.
III. Business Intelligence (BI) and Big Data Visualization
Slicers in Power BI Desktop
 Slicer for Text.
 Format Text Slicer.
 Date Slicer.
 Format Date Slicer .
 Number Slicer.
Advanced Charts in Power BI Desktop
 Animated Bar Chart Race.
 Drill Down Donut Chart.
 Drill Down Column Chart.
 Word Cloud.
 Sankey Chart.
 Infographic.
 Play Axis.
 Scroller.
 Sunburst Chart.
 10- Histogram
Objects and Actions (Hyperlinks) in PBI
 Insert Image.
 Insert Text.
 Insert Shapes.
 Insert Buttons.
 Action - Web URL.
 Action - Page Navigation.
 Action - Bookmark Action.
 Action - Drill through.
Power BI Service Introduction
 Creating a Superstore Report.
 Create an Account on Power BI Service.
 Publish Report to Power BI Service Account.
 Export (PPT, PDF, PBIX) Report and Share.
 Comment, Share and Subscribe to a report.
 Create a dashboard in Power BI Service.
 Problem in Power BI Dashboard & its solution.
 Automatic Refresh - Data Gatewayn.
 Exam
 Introduction to NoSQL databases
 Comparison with SQL databases
 Document NoSQL Store
 Introduction and Installation
 Basic commands
 Document NoSQL Data Modeling.
 Practice exercises
 Integration of Document NoSQL database with Apache Spark
Machine Learning
 Practice exercises
 Graph NoSQL Store
 Introduction and Installation
 Basic commands
 Graph NoSQL Data Modeling
 Integration of Graph NoSQL database with Apache Spark
Machine Learning
 Practice exercises
 Key-Value NoSQL Store
 Introduction and Installation
 Exam
IV. Big Data Management Systems with NoSQL Data Stores V. Machine Learning for Big Data
 What is Machine Learning, and what tools are required for
learning it
 Differences between classification and regression-based
problems, Supervise and unsupervised categories, and real-
world examples
 Machine learning protocol for implementation
 Linear regression; How it works, mathematical and graphical
representation of LR
 Python implementation of Linear regression
 Discuss the performance metrics for regression-based
problems
 Logistic regression; how it works, Decision boundaries,
Sigmoid function,
 Python implementation of Logistic regression
 Discuss the performance metrics for classification-based
problems
 The real-life problem of logistic regression
 For regression and classification-based problems
 K-neighbour nearest
 Support vector machine
 Discussion of overfitting and underfitting
 Cross-validation
 Hold out cross-validation
 K-fold and its types of cross-validation
 Leave one out cross-validation
 Bootstrap cross-validation
 Python implementation of Cross validation
 Classification and Regression
 Decision tree
 Random forest
 Parameter behavior of both algorithms
 Overfitting and underfitting handling
 Hyper-parameter
 Un supervised learning; K-mean
 Feature selection; PCA
 Discussion of project
 live demonstration on Kaggle submission
 real-life problem solving
 Final Exam
 Infrastructure Development for Real-Time Big Data Analytics
 Streaming Introduction
 Big Data Pipelines: The Rise of Real-Time
V. Machine Learning for Big Data VI. Case Study
Stream processing with Apache Storm
 How does Twitter compute trends
 Improve performance using distributed processing
 Building blocks of Storm Topologies
 Adding Parallelism in a Storm Topology
 Components of Storm Cluster
 A simple Hello World Topology
 Implementing Bolt & Submitting a Topology
Processing Data using Files
 Reading Data from a file
 Representing Data using Tuples
 Accessing Data from Tuples
 Writing Data to a File
 Assignment 1
VI. Case Study
Spark Streaming
 Streaming Architecture
 Deployment of Collection and Message Queuing Tiers
 Introduction of message queuing tier using Apache Kafka
Running The Collection Tier (Part II - Sending Data)
Data Access Tier
 Introduction to Data Access tier - MongoDB
 Exploring Spring Reactive
 Exposing Data Access tier in browser
 Analysis Tier
 Introduction to Analysis tier - Apache Spark
 Plug-in Spark Analysis Tier to Our Pipelines
 A brief overview of Spark RDDs
 Fault Tolerance
 Kafka Connect
 Assignment 2
Brief introduction to
 DaLambda vs Kafka architecture
 taFrame, DataSets, and SparkSQL
 Spark Structured Streaming
Benefits of Kappa architecture.
Building Data Pipelines using Apache Airflow
 Advantages of using DAGs in Apache Airflow
 Apache Airflow UI
 Building DAG using Airflow
 Airflow Monitoring and Logging
 Assignment 3
VII. Final Exam

More Related Content

Big Data - IBA.pptx

  • 2. education providing professional diplomas, certifications, and workshops in the field of Information technology, Information Systems, and Computer Science. The CICT was established in 2016 providing high-quality professional education to the private and public sectors in Pakistan. The CICT has been associated with renowned faculty members who conduct certain courses and workshops which contribute to the digitalization in the educational and professional sectors. The Center for Information & Communication Technology (CICT) aspires to meet the highest standards of IT excellence required in pursuit of management strategies. The Center for Information & Communication Technology (CICT) is to provide excellent teaching and research environment specially in Information Technology to produce students/professionals who distinguish themselves by their professional competence, research, entrepreneurship, humanistic outlook, ethical rectitude, pragmatic approach to problem solving, managerial skills and ability to respond to the challenge of socio-economic development to serve as the vanguard of techno- industrial transformation of the society.
  • 3. Be respectful to your teachers as well as your classmates, any kind of disrespect or misbehavior will be subject to dismissal from the course. Be on time for your classes and avoid being irregular. All students are required to carry their ID Cards provided by the administration to avoid any inconveniences while entering the premise. In case of Card loss, the student is required to report to the department to state their case. Avoid usage of mobile devices during your sessions, the instructor has a full right to dismiss any student who has been found using the phone during class. Do not bring any food or drinks inside your classrooms especially computer labs as it is strictly prohibited. Don't forget to turn off the machine once you are done using it. Do not plug in external devices without scanning them for computer viruses. Note: IBA plays a vital role in maintaining decorum which applies to every candidate entering the premises. Keeping in mind that the campus is Smoke- free any candidate carrying out any mal activity is reluctantly charged & may result is dismissal from the course.
  • 4. The convergence of big data and machine learning with technologies such as cloud services, sensors, ubiquitous computing, mobile devices, and the Internet of Things has created vast new opportunities for business. Analytics has become a competitive and sustainable advantage for many organizations. To harness the benefits of big data and machine learning, however, business leaders face the pressing challenge of not only acquiring the right technologies and talent to analyze and interpret the data but also weaving a data-centric mindset into the organization's structure and cultural fabric. This Four-Month Diploma will empower me with the skills and confidence to tackle data-driven opportunities and accelerate data- analysis transformation in the organization. Through lectures, case studies, and discussions, real-world insights will be gained on various applications of big data analytics and machine learning, and how they can be used to fuel better decision-making within the context of the attendee's own Department/Organization.
  • 5. Mr. Sohail Imran: For more than 18 years, Mr. Sohail Imran is conducting training and workshops for databases (SQL and NoSQL), Big Data Infrastructure, and Machine Learning for different institutes, universities, and the corporate sector. More than 8 years of professional experience in Big Data Analytics, Data Science, Data Mining, Data Warehousing, and DBMS (SQL and NoSQL). Providing consultancy in designing and developing Big Data Analytics platforms using Java, RapideMiner, Radoop, Python, Hadoop, Hive, Spark, Kafka, Spark Streaming, Storm, etc. Mr. Muhammad Rizwan: Muhammad Rizwan provides digital leadership to organizations, from strategy to execution, globally and locally. In his 25+ corporate career, he has worked in both public and private equity spaces with technology-led and digitally-enabled businesses in various management vernaculars, including C-Suite. He holds a certificate from Stanford University in Machine Learning. A master's degree from Hamdard. Bachelor's degree in Statistics/Commerce and Information Systems. He also carries an international diploma in Software Engineering. Currently, heading the information systems at Dollar Industries (Pvt) Ltd. He has worked for Hino Pak Motors, Karachi Stock Exchange, and CPLC (Car theft software) projects. Dr. Affan Alim: Dr. Muhammad Affan Alim has 16 years of teaching, research, and development experience in Machine Learning, Deep Learning, Data Science, pattern classification, computer vision, Optimization of models, and statistical & mathematical analysis. He also has several years of professional experience in software development in Pakistan and the United Kingdom (UK). He has developed several industry-based projects. Mr. Muhammad Shamim Ahmed: With overall 10 years of experience in Project management, Oracle functional consultation, and 4 years as an Oracle University trainer, Mr. Shamim has been a great asset to IBA CICT. He has executed 5 projects to date in various capacities, with proven experience with the complete life cycle of Oracle EBS implementation. He has been actively involved in all stages of project management such as initiation, Business Blueprint (AS-IS Document), GAP Analysis, Solution design, and much more. He is exceptionally motivated, energetic, and enthusiastic about learning and teaching
  • 6. History and Evolution of Python Advantages of Apache Spark with Python in a Big Data Environment Setting up Big Data Programming Development Environments Programming Language Basics Collections and their types Conditional Control Structures Iterative Control Structures Methods with Practice Examples Module and File I/O Object-Oriented Programming Concepts Apache Spark and Python for Machine Learning Exam I. Python Foundations for Big Data Analytics Intro to data science, Role of the database designer, data engineer, data analyst, and data scientist. what data scientists do. The available format of data and what types of data a data scientist received. The life cycle of data science, data science competitions The difference between wrangling and feature engineering, the steps of wrangling, and its detail Reading of different methods of the dataset using pandas python Steps of feature engineering for machine learning. During understanding, some real examples will also be discussed. Requirement of the tools and techniques of data science. Missing data imputation using pandas. How to find the missing values in the dataset, what handling of missing values is important Fillna() method with different parameters for missing values. Drop all rows of missing data in Data Frame, and drop missing data rows with respect to a specific column. How to fill using aggregate values, how to fill forward and backward, and fill with reference to other columns. Practice questions The real application-based problem for missing values What is the outlier, what is the impact of an outlier in data, and how do find and visualize the outliers? Outlier removal strategies, Performing winsorization, a python implementation II. Big Data Wrangling
  • 7. Three case studies for EDA Exploratory data analytics Handling the structuring issues in the dataset, inconsistencies in date, and any other attributes. Unnecessary character attachment, Exploring these issues Handling the structuring issues using regex Categorical Variables: Encoding Categorical Variables: on hot encoding, dummy encoding, effect encoding, pros and cons of a categorical variable encoding Discussion of project Live demonstration of Kaggle and its features how Kaggle will helpful for data scientist Exam II. Big Data Wrangling III. Business Intelligence (BI) and Big Data Visualization Introduction to BI & commonly using BI tools. Power BI introduction & its components. Power View, Query, Pivot & Power BI Service Introduction to Power Query and its usage. Basic Power BI Navigation. Basic Power BI Charts. Column Chart. Stacked Column Chart. Pie Chart. Donut Chart. Funnel Chart. Ribbon Chart. Include and Exclude. Export data from Visual. Maps in Power BI Desktop Map. Filled Map. Map with Pie Chart. Formatting in Map. Background Changes in Map. Map of Pakistan in Power BI. Map of Australia. Table & Matrix in Power BI Desktop Creating a Simple Table. Formatting in Table. Conditional Formatting in Table. Changing Aggregation in Table. Creating a Matrix in Power BI. Conditional Formatting in Matrix. Automatic Hierarchy in Matrix. Other Charts in Power BI Desktop Line Chart. Drill down in Line Chart. Area Chart. Line vs Column Chart. Scatter Plot. Waterfall Chart.
  • 8. Cards and Filters in Power BI Desktop. Number Card. Text Card. Date Card. Multi-Row Card. Filter on Visual. Filter on Page. Filter on All Pages. Drill through. III. Business Intelligence (BI) and Big Data Visualization Slicers in Power BI Desktop Slicer for Text. Format Text Slicer. Date Slicer. Format Date Slicer . Number Slicer. Advanced Charts in Power BI Desktop Animated Bar Chart Race. Drill Down Donut Chart. Drill Down Column Chart. Word Cloud. Sankey Chart. Infographic. Play Axis. Scroller. Sunburst Chart. 10- Histogram Objects and Actions (Hyperlinks) in PBI Insert Image. Insert Text. Insert Shapes. Insert Buttons. Action - Web URL. Action - Page Navigation. Action - Bookmark Action. Action - Drill through. Power BI Service Introduction Creating a Superstore Report. Create an Account on Power BI Service. Publish Report to Power BI Service Account. Export (PPT, PDF, PBIX) Report and Share. Comment, Share and Subscribe to a report. Create a dashboard in Power BI Service. Problem in Power BI Dashboard & its solution. Automatic Refresh - Data Gatewayn. Exam
  • 9. Introduction to NoSQL databases Comparison with SQL databases Document NoSQL Store Introduction and Installation Basic commands Document NoSQL Data Modeling. Practice exercises Integration of Document NoSQL database with Apache Spark Machine Learning Practice exercises Graph NoSQL Store Introduction and Installation Basic commands Graph NoSQL Data Modeling Integration of Graph NoSQL database with Apache Spark Machine Learning Practice exercises Key-Value NoSQL Store Introduction and Installation Exam IV. Big Data Management Systems with NoSQL Data Stores V. Machine Learning for Big Data What is Machine Learning, and what tools are required for learning it Differences between classification and regression-based problems, Supervise and unsupervised categories, and real- world examples Machine learning protocol for implementation Linear regression; How it works, mathematical and graphical representation of LR Python implementation of Linear regression Discuss the performance metrics for regression-based problems Logistic regression; how it works, Decision boundaries, Sigmoid function, Python implementation of Logistic regression Discuss the performance metrics for classification-based problems The real-life problem of logistic regression For regression and classification-based problems K-neighbour nearest Support vector machine Discussion of overfitting and underfitting Cross-validation Hold out cross-validation
  • 10. K-fold and its types of cross-validation Leave one out cross-validation Bootstrap cross-validation Python implementation of Cross validation Classification and Regression Decision tree Random forest Parameter behavior of both algorithms Overfitting and underfitting handling Hyper-parameter Un supervised learning; K-mean Feature selection; PCA Discussion of project live demonstration on Kaggle submission real-life problem solving Final Exam Infrastructure Development for Real-Time Big Data Analytics Streaming Introduction Big Data Pipelines: The Rise of Real-Time V. Machine Learning for Big Data VI. Case Study Stream processing with Apache Storm How does Twitter compute trends Improve performance using distributed processing Building blocks of Storm Topologies Adding Parallelism in a Storm Topology Components of Storm Cluster A simple Hello World Topology Implementing Bolt & Submitting a Topology Processing Data using Files Reading Data from a file Representing Data using Tuples Accessing Data from Tuples Writing Data to a File Assignment 1
  • 11. VI. Case Study Spark Streaming Streaming Architecture Deployment of Collection and Message Queuing Tiers Introduction of message queuing tier using Apache Kafka Running The Collection Tier (Part II - Sending Data) Data Access Tier Introduction to Data Access tier - MongoDB Exploring Spring Reactive Exposing Data Access tier in browser Analysis Tier Introduction to Analysis tier - Apache Spark Plug-in Spark Analysis Tier to Our Pipelines A brief overview of Spark RDDs Fault Tolerance Kafka Connect Assignment 2 Brief introduction to DaLambda vs Kafka architecture taFrame, DataSets, and SparkSQL Spark Structured Streaming Benefits of Kappa architecture. Building Data Pipelines using Apache Airflow Advantages of using DAGs in Apache Airflow Apache Airflow UI Building DAG using Airflow Airflow Monitoring and Logging Assignment 3 VII. Final Exam