The document provides information about the CS3361 - Data Science Laboratory course for the second year third semester. It includes the course objectives, list of experiments, list of equipment, total periods, and course outcomes. The experiments cover downloading and exploring Python packages for data science like NumPy, SciPy, Pandas, and performing descriptive analytics, correlation, and regression on benchmark datasets. Students will learn to present and interpret data using Python visualization packages.
The document provides an overview and agenda for an introduction to running AI workloads on PowerAI. It discusses PowerAI and how it combines popular deep learning frameworks, development tools, and accelerated IBM Power servers. It then demonstrates AI workloads using TensorFlow and PyTorch, including running an MNIST workload to classify handwritten digits using basic linear regression and convolutional neural networks in TensorFlow, and an introduction to PyTorch concepts like tensors, modules, and softmax cross entropy loss.
6.1 Installation of Pandas, working with pandas, Dataframe, basic operations on Pandas,Data operations, pandas plot
6.2 Installation of libraries, working of libraries, Read and Save Image, Basic Operation on Images [ OpenCV, Scikit-Image ,Scipy, Python Image Library (Pillow/PIL) ,Matplotlib, SimpleITK,Numpy ,Mahotas ]
This document discusses Python libraries, including popular libraries for data analysis, web development, and machine learning. It provides examples of how to use the Matplotlib and NumPy libraries, describing their features and sample code. The key steps to install and import Python libraries using pip and import statements are also outlined. Overall, the document introduces several essential Python libraries and their applications.
Start machine learning in 5 simple stepsRenjith M P
油
Simple steps to get started with machine learning.
The use case uses python programming. Target audience is expected to have a very basic python knowledge.
The document discusses how mold plays an important role in the environment but can also cause harm if it grows undetected indoors. It emphasizes the importance of drying wet areas within 24-48 hours to prevent mold growth, as mold needs moisture to develop. Proper ventilation is also recommended to prevent routine indoor activities from causing excess moisture that can encourage mold growth.
Python is the choice llanguage for data analysis,
The aim of this slide is to provide a comprehensive learning path to people new to python for data analysis. This path provides a comprehensive overview of the steps you need to learn to use Python for data analysis.
This document provides an agenda for a training session on AI and data science. The session is divided into two units: data science and data visualization. Key Python libraries that will be covered for data science include NumPy, Pandas, and Matplotlib. NumPy will be used to create and manipulate multi-dimensional arrays. Pandas allows users to work with labeled and relational data. Matplotlib enables data visualization through graphs and plots. The session aims to provide knowledge of core data science libraries and demonstrate data exploration techniques using these packages.
This document provides an overview of running an image classification workload using IBM PowerAI and the MNIST dataset. It discusses deep learning concepts like neural networks and training flows. It then demonstrates how to set up TensorFlow on an IBM PowerAI trial server, load the MNIST dataset, build and train a basic neural network model for image classification, and evaluate the trained model's accuracy on test data.
The document summarizes MLOps using Protobuf in Unity for a 3D game called FunAI. It discusses using Unity and MLAgents to build a learning environment, training models in Python and playing them in a Unity docker container. The key steps are:
1. Building a Unity environment with MLAgents to get observations from sensors and take actions through behaviors.
2. Recording data from the Unity environment and using it to train models in Python.
3. Serializing the data with Protobuf for efficient communication between Python and Unity via gRPC.
4. Dockerizing the training process and playing trained models to deploy the MLOps pipeline.
This document discusses several popular Python libraries:
- NumPy is a fundamental package for scientific computing and machine learning that represents data as n-dimensional arrays. Its array interface allows representing images, sounds, and other data as arrays.
- Pandas allows working with and analyzing datasets, including functions for cleaning, exploring, and manipulating data. It can analyze big data and draw conclusions.
- Pyttsx3 is a text-to-speech library that can convert text to speech offline, unlike some other libraries.
- Wikipedia allows programmatically accessing and parsing data from Wikipedia, including searching, getting article summaries and linked data.
- Other standard Python modules discussed include datetime for date/time handling, webbrowser for controlling browsers,
IRJET- Object Detection in an Image using Convolutional Neural NetworkIRJET Journal
油
This document summarizes a research paper on object detection in images using convolutional neural networks. The paper proposes using the ImageAI Python library along with a region-based convolutional neural network model to perform object detection. The methodology loads a pre-trained model, imports the ImageAI detection class, detects objects in an input image, and prints the detected objects and probabilities. The system can accurately detect objects like faces and vehicles from images and has applications in security and traffic monitoring. However, training the neural network requires a large dataset and takes a long time.
Machine Learning with Python discusses machine learning concepts and the Python tools used for machine learning. It introduces machine learning terminology and different types of learning. It describes the Pandas, Matplotlib and scikit-learn frameworks for data analysis and machine learning in Python. Examples show simple programs for supervised learning using linear regression and unsupervised learning using K-means clustering.
The document provides a tutorial on Orange, an open source data mining package. It discusses Orange's features, how to install Orange Canvas on Windows and Ubuntu, and provides Python scripting code examples for using Orange, including calculating association rule support and confidence, naive Bayes classification, regression, and k-means clustering. The Python scripts demonstrate how to load and analyze data using Orange's Python API.
This document provides an agenda for an introduction to running AI workloads on PowerAI. It includes:
- An overview of IBM PowerAI and demos of AI workloads using TensorFlow and PyTorch hands-on labs.
- A demonstration of running the MNIST workload using TensorFlow to classify handwritten digits, including downloading the workload, training a basic model, and predicting classes of new images.
- An introduction to PyTorch, describing it as a flexible deep learning framework that supports dynamic computation graphs, native Python packages, and automatic differentiation.
PyTorch is an open-source machine learning library for Python. It is primarily developed by Facebook's AI research group. The document discusses setting up PyTorch, including installing necessary packages and configuring development environments. It also provides examples of core PyTorch concepts like tensors, common datasets, and constructing basic neural networks.
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET Journal
油
This document provides an unabridged review of supervised machine learning regression and classification techniques. It begins with an introduction to machine learning and artificial intelligence. It then describes regression and classification techniques for supervised learning problems, including linear regression, logistic regression, k-nearest neighbors, naive bayes, decision trees, support vector machines, and random forests. Practical examples are provided using Python code for applying these techniques to housing price prediction and iris species classification problems. The document concludes that the primary goal was to provide an extensive review of supervised machine learning methods.
Ijaems apr-2016-17 Raspberry PI Based Artificial Vision Assisting System for ...INFOGAIN PUBLICATION
油
The main aim of this paper is to implement a system that will help blind person. This system is used by a RASPBERRY PI circuit to provide for the identification of the objects, the first level localization. It also incorporates additional components to provide more refined location and orientation information. The input process is to capture every object around 10m and it is convert into the output processing in voice command which is adopted in Bluetooth headset which is used by blind people using RASPBERRY PI component.
The document provides an overview of popular Python libraries for data science such as NumPy, SciPy, Pandas, SciKit-Learn, matplotlib, and Seaborn. It discusses the key features and uses of each library. The document also demonstrates how to load data into Pandas data frames, explore and manipulate the data frames using various methods like head(), groupby(), filtering, and slicing. Summary statistics, plotting and other analyses can be performed on the data frames using these libraries.
This document discusses using machine learning algorithms to predict employee attrition and understand factors that influence turnover. It evaluates different machine learning models on an employee turnover dataset to classify employees who are at risk of leaving. Logistic regression and random forest classifiers are applied and achieve accuracy rates of 78% and 98% respectively. The document also discusses preprocessing techniques and visualizing insights from the models to better understand employee turnover.
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks
油
A long time ago, there was Caffe and Theano, then came Torch and CNTK and Tensorflow, Keras and MXNet and Pytorch and Caffe2.a sea of Deep learning tools but none for Spark developers to dip into. Finally, there was BigDL, a deep learning library for Apache Spark. While BigDL is integrated into Spark and extends its capabilities to address the challenges of Big Data developers, will a library alone be enough to simplify and accelerate the deployment of ML/DL workloads on production clusters? From high level pipeline API support to feature transformers to pre-defined models and reference use cases, a rich repository of easy to use tools are now available with the Analytics Zoo. Well unpack the production challenges and opportunities with ML/DL on Spark and what the Zoo can do
this is simple internship report of machine learning ,in which there is project of facemask detection using machine learning and python libraries.
slide also contain information about the center of internship and what all things where taught during the internship.
You can also contact Rohan sir for more further internship details.
hope this ppt helps you!!!!!thank you!!!!!
Matplotlib is a popular Python library used for data visualization and making 2D plots from data. It provides an object-oriented API that allows plots to be embedded in Python applications. Matplotlib has a MATLAB-like procedural interface called Pylab and can be considered an open source alternative to MATLAB. It is written in Python and relies on NumPy for numerical computations. Examples shown how to generate simple plots by importing Matplotlib and NumPy, preparing data, and using Matplotlib functions to plot and display the results.
This document provides an introduction to the SciPy Python library and its uses for scientific computing and data analysis. It discusses how SciPy builds on NumPy to provide functions for domains like linear algebra, integration, interpolation, optimization, statistics, and more. Examples are given of using SciPy for tasks like LU decomposition of matrices, sparse linear algebra, single and double integrals, line plots, and statistics. SciPy allows leveraging Python's simplicity for technical applications involving numerical analysis and data manipulation.
Intermediate Code Generator for compiler designMRKUsafzai0607
油
Intermediate code generation takes source code and converts it into an intermediate representation that is easier for compilers to analyze and optimize before generating target code. This process involves translating source code into a three-address code format using quadruples or triples to represent operations and their operands, with data structures like lists used to store and manipulate the intermediate code.
The document discusses rules for finding the FIRST and FOLLOW sets in a context-free grammar. It provides examples to demonstrate applying the rules. The FIRST set contains the first terminal symbol that can derive a nonterminal. The FOLLOW set contains the terminal symbols that may follow a nonterminal. The rules are used to determine the FIRST and FOLLOW sets for nonterminals in sample grammars.
More Related Content
Similar to Introduction to Machine Learning by MARK (20)
This document provides an agenda for a training session on AI and data science. The session is divided into two units: data science and data visualization. Key Python libraries that will be covered for data science include NumPy, Pandas, and Matplotlib. NumPy will be used to create and manipulate multi-dimensional arrays. Pandas allows users to work with labeled and relational data. Matplotlib enables data visualization through graphs and plots. The session aims to provide knowledge of core data science libraries and demonstrate data exploration techniques using these packages.
This document provides an overview of running an image classification workload using IBM PowerAI and the MNIST dataset. It discusses deep learning concepts like neural networks and training flows. It then demonstrates how to set up TensorFlow on an IBM PowerAI trial server, load the MNIST dataset, build and train a basic neural network model for image classification, and evaluate the trained model's accuracy on test data.
The document summarizes MLOps using Protobuf in Unity for a 3D game called FunAI. It discusses using Unity and MLAgents to build a learning environment, training models in Python and playing them in a Unity docker container. The key steps are:
1. Building a Unity environment with MLAgents to get observations from sensors and take actions through behaviors.
2. Recording data from the Unity environment and using it to train models in Python.
3. Serializing the data with Protobuf for efficient communication between Python and Unity via gRPC.
4. Dockerizing the training process and playing trained models to deploy the MLOps pipeline.
This document discusses several popular Python libraries:
- NumPy is a fundamental package for scientific computing and machine learning that represents data as n-dimensional arrays. Its array interface allows representing images, sounds, and other data as arrays.
- Pandas allows working with and analyzing datasets, including functions for cleaning, exploring, and manipulating data. It can analyze big data and draw conclusions.
- Pyttsx3 is a text-to-speech library that can convert text to speech offline, unlike some other libraries.
- Wikipedia allows programmatically accessing and parsing data from Wikipedia, including searching, getting article summaries and linked data.
- Other standard Python modules discussed include datetime for date/time handling, webbrowser for controlling browsers,
IRJET- Object Detection in an Image using Convolutional Neural NetworkIRJET Journal
油
This document summarizes a research paper on object detection in images using convolutional neural networks. The paper proposes using the ImageAI Python library along with a region-based convolutional neural network model to perform object detection. The methodology loads a pre-trained model, imports the ImageAI detection class, detects objects in an input image, and prints the detected objects and probabilities. The system can accurately detect objects like faces and vehicles from images and has applications in security and traffic monitoring. However, training the neural network requires a large dataset and takes a long time.
Machine Learning with Python discusses machine learning concepts and the Python tools used for machine learning. It introduces machine learning terminology and different types of learning. It describes the Pandas, Matplotlib and scikit-learn frameworks for data analysis and machine learning in Python. Examples show simple programs for supervised learning using linear regression and unsupervised learning using K-means clustering.
The document provides a tutorial on Orange, an open source data mining package. It discusses Orange's features, how to install Orange Canvas on Windows and Ubuntu, and provides Python scripting code examples for using Orange, including calculating association rule support and confidence, naive Bayes classification, regression, and k-means clustering. The Python scripts demonstrate how to load and analyze data using Orange's Python API.
This document provides an agenda for an introduction to running AI workloads on PowerAI. It includes:
- An overview of IBM PowerAI and demos of AI workloads using TensorFlow and PyTorch hands-on labs.
- A demonstration of running the MNIST workload using TensorFlow to classify handwritten digits, including downloading the workload, training a basic model, and predicting classes of new images.
- An introduction to PyTorch, describing it as a flexible deep learning framework that supports dynamic computation graphs, native Python packages, and automatic differentiation.
PyTorch is an open-source machine learning library for Python. It is primarily developed by Facebook's AI research group. The document discusses setting up PyTorch, including installing necessary packages and configuring development environments. It also provides examples of core PyTorch concepts like tensors, common datasets, and constructing basic neural networks.
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET Journal
油
This document provides an unabridged review of supervised machine learning regression and classification techniques. It begins with an introduction to machine learning and artificial intelligence. It then describes regression and classification techniques for supervised learning problems, including linear regression, logistic regression, k-nearest neighbors, naive bayes, decision trees, support vector machines, and random forests. Practical examples are provided using Python code for applying these techniques to housing price prediction and iris species classification problems. The document concludes that the primary goal was to provide an extensive review of supervised machine learning methods.
Ijaems apr-2016-17 Raspberry PI Based Artificial Vision Assisting System for ...INFOGAIN PUBLICATION
油
The main aim of this paper is to implement a system that will help blind person. This system is used by a RASPBERRY PI circuit to provide for the identification of the objects, the first level localization. It also incorporates additional components to provide more refined location and orientation information. The input process is to capture every object around 10m and it is convert into the output processing in voice command which is adopted in Bluetooth headset which is used by blind people using RASPBERRY PI component.
The document provides an overview of popular Python libraries for data science such as NumPy, SciPy, Pandas, SciKit-Learn, matplotlib, and Seaborn. It discusses the key features and uses of each library. The document also demonstrates how to load data into Pandas data frames, explore and manipulate the data frames using various methods like head(), groupby(), filtering, and slicing. Summary statistics, plotting and other analyses can be performed on the data frames using these libraries.
This document discusses using machine learning algorithms to predict employee attrition and understand factors that influence turnover. It evaluates different machine learning models on an employee turnover dataset to classify employees who are at risk of leaving. Logistic regression and random forest classifiers are applied and achieve accuracy rates of 78% and 98% respectively. The document also discusses preprocessing techniques and visualizing insights from the models to better understand employee turnover.
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks
油
A long time ago, there was Caffe and Theano, then came Torch and CNTK and Tensorflow, Keras and MXNet and Pytorch and Caffe2.a sea of Deep learning tools but none for Spark developers to dip into. Finally, there was BigDL, a deep learning library for Apache Spark. While BigDL is integrated into Spark and extends its capabilities to address the challenges of Big Data developers, will a library alone be enough to simplify and accelerate the deployment of ML/DL workloads on production clusters? From high level pipeline API support to feature transformers to pre-defined models and reference use cases, a rich repository of easy to use tools are now available with the Analytics Zoo. Well unpack the production challenges and opportunities with ML/DL on Spark and what the Zoo can do
this is simple internship report of machine learning ,in which there is project of facemask detection using machine learning and python libraries.
slide also contain information about the center of internship and what all things where taught during the internship.
You can also contact Rohan sir for more further internship details.
hope this ppt helps you!!!!!thank you!!!!!
Matplotlib is a popular Python library used for data visualization and making 2D plots from data. It provides an object-oriented API that allows plots to be embedded in Python applications. Matplotlib has a MATLAB-like procedural interface called Pylab and can be considered an open source alternative to MATLAB. It is written in Python and relies on NumPy for numerical computations. Examples shown how to generate simple plots by importing Matplotlib and NumPy, preparing data, and using Matplotlib functions to plot and display the results.
This document provides an introduction to the SciPy Python library and its uses for scientific computing and data analysis. It discusses how SciPy builds on NumPy to provide functions for domains like linear algebra, integration, interpolation, optimization, statistics, and more. Examples are given of using SciPy for tasks like LU decomposition of matrices, sparse linear algebra, single and double integrals, line plots, and statistics. SciPy allows leveraging Python's simplicity for technical applications involving numerical analysis and data manipulation.
Intermediate Code Generator for compiler designMRKUsafzai0607
油
Intermediate code generation takes source code and converts it into an intermediate representation that is easier for compilers to analyze and optimize before generating target code. This process involves translating source code into a three-address code format using quadruples or triples to represent operations and their operands, with data structures like lists used to store and manipulate the intermediate code.
The document discusses rules for finding the FIRST and FOLLOW sets in a context-free grammar. It provides examples to demonstrate applying the rules. The FIRST set contains the first terminal symbol that can derive a nonterminal. The FOLLOW set contains the terminal symbols that may follow a nonterminal. The rules are used to determine the FIRST and FOLLOW sets for nonterminals in sample grammars.
This document discusses several variants of the Turing machine model of computation including multi-tape Turing machines, multi-track Turing machines, non-deterministic Turing machines, and semi-infinite Turing machines. While these variants are more computationally powerful, any computation possible on one model is possible on the others. Multi-tape machines use multiple independent tapes but are equivalent to single-tape machines. Multi-track machines use one head to read multiple tracks. Non-deterministic machines allow for non-deterministic transitions. Semi-infinite machines have a tape with a left end but no right end.
Barbara Bianco
Project Manager and Project Architect, with extensive experience in managing and developing complex projects from concept to completion. Since September 2023, she has been working as a Project Manager at MAB Arquitectura, overseeing all project phases, from concept design to construction, with a strong focus on artistic direction and interdisciplinary coordination.
Previously, she worked at Progetto CMR for eight years (2015-2023), taking on roles of increasing responsibility: initially as a Project Architect, and later as Head of Research & Development and Competition Area (2020-2023).
She graduated in Architecture from the University of Genoa and obtained a Level II Masters in Digital Architecture and Integrated Design from the INArch Institute in Rome, earning the MAD Award. In 2009, she won First Prize at Urban Promo Giovani with the project "From Urbanity to Humanity", a redevelopment plan for the Maddalena district of Genoa focused on the visual and perceptive rediscovery of the city.
Experience & Projects
Barbara has developed projects for major clients across various sectors (banking, insurance, real estate, corporate), overseeing both the technical and aesthetic aspects while coordinating multidisciplinary teams. Notable projects include:
The Sign Business District for Covivio, Milan
New L'Or辿al Headquarters in Milan, Romolo area
Redevelopment of Via C. Colombo in Rome for Prelios, now the PWC headquarters
Interior design for Spark One & Spark Two, two office buildings in the Santa Giulia district, Milan (Spark One: 53,000 m族) for In.Town-Lendlease
She has also worked on international projects such as:
International Specialized Hospital of Uganda (ISHU) Kampala
Palazzo Milano, a residential building in Taiwan for Chonghong Construction
Chua Lang Street Building, a hotel in Hanoi
Manjiangwan Masterplan, a resort in China
Key Skills
鏝 Integrated design: managing and developing projects from concept to completion
鏝 Artistic direction: ensuring aesthetic quality and design consistency
鏝 Project management: coordinating clients, designers, and multidisciplinary consultants
鏝 Software proficiency: AutoCAD, Photoshop, InDesign, Office Suite
鏝 Languages: Advanced English, Basic French
鏝 Leadership & problem-solving: ability to lead teams and manage complex processes in dynamic environments
"Introduction to VLSI Design: Concepts and Applications"GtxDriver
油
This document offers a detailed exploration of VLSI (Very Large-Scale Integration) design principles, techniques, and applications. Topics include transistor-level design, digital circuit integration, and optimization strategies for modern electronics. Ideal for students, researchers, and professionals seeking a comprehensive guide to VLSI technology.
irst-order differential equations find applications in modeling various phenomena, including growth and decay processes, Newton's law of cooling, electrical circuits, falling body problems, and mixing problems.
Self-Compacting Concrete: Composition, Properties, and Applications in Modern...NIT SILCHAR
油
Self-Compacting Concrete (SCC) is a high-performance material that flows under its own weight, eliminating the need for vibration. It offers superior workability, durability, and structural efficiency, making it ideal for complex designs, congested reinforcement, and sustainable construction practices.
2. NumPy: NumPy is a library for the Python programming language, adding
support for large, multi-dimensional arrays and matrices, along with a
collection of mathematical functions to operate on these arrays.
Key Features:
Array creation and manipulation
Mathematical operations on arrays
Linear algebra operations
Fourier transforms
Random number generation
Applications:
Scientific computing
Data analysis and manipulation
Machine learning 2
3. How to install NumPy on Jupyter?
Open the jupyter notebook and type the following code:
!pip install numpy
Import numpy as np
Solve the following code then:
n = np.array((1,2,3))
Print(n)
Type of object:
Print(type(n))
3
4. OpenCV (Open Source Computer Vision Library):
OpenCV is an open-source computer vision and machine learning software
library. It provides a wide range of functionalities for real-time computer vision,
including image and video processing, object detection, face recognition, and
more.
Key Features:
Image and video I/O
Image processing algorithms
Object detection and tracking
Machine learning algorithms for computer vision tasks
Applications:
Robotics
Augmented reality
Surveillance systems
Medical image analysis 4
5. How to install Open CV on Jupyter?
Open the jupyter notebook and type the following code:
!pip install opencv-python
import cv2
img = cv2.imread("img1.png")
cv2.imshow("MRK",img)
cv2.waitKey(10000)
cv2.destroyAllWindows()
5
6. Matplotlib is a comprehensive library for creating static, animated, and
interactive visualizations in Python. It provides a MATLAB-like interface and
supports a wide variety of plots and graphs.
Key Features:
Line plots, scatter plots, and histograms
2D and 3D plotting
Customization of plots
Integration with NumPy arrays
Applications:
Data visualization
Scientific plotting
Statistical analysis
6
7. How to install Matplotlib on Jupyter?
Open the jupyter notebook and type the following code:
!pip install matplotlib
Import matplotlib.pyplot as plt // as means alias (named)
import numpy as np
xpts = np.array([0,4])
ypts = np.array([0,6])
plt.plot(xpts,ypts)
plt.show()
7
8. scikit-image, commonly abbreviated as skimage, is an open-source image
processing library for Python.
It provides a collection of algorithms for image division, feature extraction,
image filtering, and other image processing tasks
Image Processing
Integration: It seamlessly integrates with other scientific Python libraries such
as NumPy, SciPy, and Matplotlib, allowing for efficient image manipulation and
analysis.
User-Friendly API
Community Support: Skimage benefits from an active community of developers
and users,
8
9. Installing scikit-image library:
Pip install scikit-image
Import skimage
from skimage import io
# Load an image from a file
image = io.imread('example_image.jpg')
# Display the image
io.imshow(image)
io.show()
9
10. Pillow is a Python Imaging Library (PIL) fork, which adds extensive image processing
capabilities to Python. It provides support for opening, manipulating, and saving many
different image file formats.
Image Manipulation: Pillow offers a wide range of image handling functionalities such
as resizing, cropping, rotating, filtering, and enhancing images.
Image File Support: It supports various image file formats including JPEG, PNG, GIF,
etc. making it suitable for handling varied image data.
Integration: Pillow seamlessly integrates with other Python libraries such as NumPy
and Matplotlib, enabling easy interoperability with scientific computing and data
visualization tools.
Ease of Use: Pillow provides a simple and intuitive API for working with images,
making it accessible to users with varying levels of programming experience.
Activeness: Pillow is actively maintained and updated, ensuring compatibility with the
latest Python versions and continued support for new features and improvements.
10
11. Installing Pillow library:
Pip install pillow
from PIL import Image
# Open an image file
original_image =
Image.open("example.jpg")
# Display basic information about
the image
print("Original Image Format:",
original_image.format)
print("Original Image Size:",
original_image.size)
# Resize the image
new_size = (original_image.size[0] //
2, original_image.size[1] // 2)
# Reduce size by half
resized_image =
original_image.resize(new_size)
11
# Display new size
print("Resized Image Size:", resized_image.size)
# Save the resized image with a new name
resized_image.save("resized_example.jpg")
# Close the original and resized images
original_image.close()
resized_image.close()
print("Resized image saved successfully!")
12. Pandas is a powerful Python library for data manipulation and analysis. It
offers data structures and functions to efficiently work with structured data like
time series, tabular, and heterogeneous data.
Data Structures: Pandas provides two main data structures: Series (1D labeled
array) and DataFrame (2D labeled data structure), which offer powerful data
manipulation capabilities.
Data Handling: It offers functionalities for reading and writing data from
various formats like CSV, Excel, SQL databases etc.
Data Analysis: Pandas supports data analysis tasks including data cleaning,
filtering, grouping, merging, and reshaping, making it indispensable for
exploratory data analysis.
Integration: It seamlessly integrates with other Python libraries such as
NumPy, Matplotlib, and scikit-learn, enhancing its capabilities in scientific
computing and machine learning tasks.
12
13. Installing Pandas library:
Pip install pandas
Some time it shows for pip upgrade
then use the following to upgrade
your pip:
Python.exe -m pip install --upgrade
pip
import pandas as pd
# Read a CSV file into a DataFrame
df = pd.read_csv("example.csv")
# Display the first few rows of the
DataFrame
print("First few rows of the
DataFrame:")
print(df.head()) 13
# Display summary information
about the DataFrame
print("nSummary
information:")
print(df.info())
# Display basic statistics of
numerical columns
print("nBasic statistics:")
print(df.describe())
14. Definition: scikit-learn is a versatile machine learning library for Python. It offers
simple and efficient tools for data mining and data analysis, implementing a wide
range of machine learning algorithms.
Machine Learning Algorithms: scikit-learn provides implementations for various
machine learning algorithms including classification, regression, clustering,
dimensionality reduction, and model selection.
Model Evaluation: It offers tools for model evaluation, cross-validation, and
hyperparameter tuning, facilitating the development of robust and accurate machine
learning models.
Integration: scikit-learn seamlessly integrates with other Python libraries such as
NumPy, SciPy, and Pandas, enabling easy preprocessing, training, and evaluation of
machine learning models.
Scalability: It is designed to be scalable and efficient, making it suitable for working
with large datasets and complex models.
14
15. Installing scikit-learn library:
Pip install scikit-learn
Import sklearn
from sklearn.datasets import load_iris
from sklearn.model_selection import
train_test_split
from sklearn.ensemble import
RandomForestClassifier
from sklearn.metrics import
accuracy_score, classification_report
# Load the Iris dataset
iris = load_iris()
X = iris.data # Features
y = iris.target # Target variable
# Split the dataset into training and
testing sets
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size=0.2,
random_state=42) 15
# Initialize the Random Forest classifier
rf_classifier =
RandomForestClassifier(n_estimators=100,
random_state=42)
# Train the classifier
rf_classifier.fit(X_train, y_train)
# Predict on the test set
y_pred = rf_classifier.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
# Display classification report
print("nClassification Report:")
print(classification_report(y_test, y_pred,
target_names=iris.target_names))
16. Seaborn is a Python library for creating attractive statistical graphics.
Statistical Visualization: Seaborn excels in generating plots like scatter plots,
bar charts, and heatmaps for effective data exploration.
Integration with Pandas: It seamlessly works with Pandas DataFrames,
making data visualization straightforward.
Customization: Users can easily customize plot aesthetics to suit their
preferences.
Statistical Analysis: Seaborn offers tools for visualizing relationships between
variables and conducting statistical analysis.
Community and Documentation: Supported by an active community and
comprehensive documentation for easy learning.
16
17. Installing seaborn library:
Pip install seaborn
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
# Load the Iris dataset
iris = load_iris()
iris_df = sns.load_dataset("iris") # Load Iris dataset as a DataFrame
# Create a pairplot using Seaborn
sns.pairplot(iris_df, hue='species', palette='Set1')
# Add title
plt.suptitle("Pairplot of Iris Dataset")
# Show the plot
plt.show()
17
18. Plotly is a Python library for creating interactive and publication-quality graphs.
Interactive Visualization: Plotly allows users to interactively explore data
through zooming and hovering over data points.
Online Platform: It offers an online platform for hosting and sharing interactive
plots.
Chart Types: Supports a wide range of chart types including scatter plots, line
plots, and 3D surface plots.
Integration: Easily integrates with other Python libraries for seamless data
manipulation and visualization.
Customization: Provides extensive options for customizing plot appearance for
tailored visualizations.
18
19. Installing plotly library:
Pip install plotly
import plotly.graph_objects as go
# Sample data
x_values = [1, 2, 3, 4, 5]
y_values = [2, 3, 5, 7, 11]
# Create a line plot
fig = go.Figure(data=go.Scatter(x=x_values, y=y_values,
mode='lines'))
# Add title and axis labels
fig.update_layout(title='Simple Line Plot',
xaxis_title='X-axis',
yaxis_title='Y-axis')
# Show the plot
fig.show() 19
20. Data Pre Processing:
Data preprocessing is a critical step in machine learning pipelines.
It is define as the techniques and procedures used to prepare raw
data for analysis.
It involves several tasks such as importing and exporting data,
cleaning and formatting data, handling missing values, and feature
scaling.
20
Importing and Exporting Data:
Importing data involves loading datasets into the machine learning
environment.
This can be done using libraries like Pandas in Python or functions like
read_csv() for CSV files, read_excel() for Excel files, etc.
import pandas as pd
df=pd.read_csv(ML.csv)
df.shape #show number of rows and columns
df.describe() #calculate the SD, mean etc.
21. Exporting the Data :
import pandas as pd
# Example DataFrame
data = {
'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
# Export DataFrame to CSV
df.to_csv('output.csv', index=False) 21
22. Cleaning and Formatting Data:
Cleaning data involves identifying and handling anomalies, inconsistencies,
and errors in the dataset.
This may include removing duplicates, correcting data types, dealing with
outliers, etc.
Formatting data involves ensuring that data is in the appropriate format for
analysis.
For example, converting categorical variables into numerical representations,
standardizing date formats, etc.
22
23. import pandas as pd
# Load the dataset
data = {
'Name': ['John', 'Alice', 'Bob', 'Anna', 'Mike', 'Emily'],
'Age': [25, 30, None, 35, 40, ''],
'City': ['New York', 'Los Angeles', 'Chicago', 'San Francisco', '',
'Seattle'],
'Gender': ['Male', 'Female', 'Male', '', 'Male', 'Female'],
'Salary': ['$50000', '$60000', '$70000', '$80000', '90000', '$100000']
}
df = pd.DataFrame(data)
# Display the original DataFrame
print("Original DataFrame:")
print(df)
print()
# Clean and format the data
# 1. Convert Age to numeric and fill missing values with the median
age
df['Age'] = pd.to_numeric(df['Age'], errors='coerce')
23
median_age = df['Age'].median() #
Calculate median age
df['Age'].fillna(median_age, inplace=True)
# Fill missing values with median
# 2. Remove rows with missing or empty
values in City and Gender columns
df = df[df['City'].notna() &
df['Gender'].notna() & (df['Gender'] != '')]
# 3. Convert Salary to numeric and remove
dollar signs
df['Salary'] = df['Salary'].replace('[$,]', '',
regex=True).astype(float)
# Display the cleaned and formatted
DataFrame
print("Cleaned and Formatted
DataFrame:")
print(df)
24. Handling Missing Values:
Missing values are common in datasets and can significantly affect the
performance of machine learning models if not handled properly.
Techniques for handling missing values include:
Imputation: Replacing missing values with a calculated or estimated value
(e.g., mean, median, mode).
Deletion: Removing rows or columns with missing values.
Advanced techniques like predictive modeling to estimate missing values
based on other features.
The example is same as previous.
24
25. Feature Scaling:
Feature scaling is the process of standardizing or normalizing the range of
independent variables or features in the dataset.
It is essential for algorithms that are sensitive to the scale of the input
features, such as gradient descent-based algorithms (e.g., linear regression,
logistic regression) or distance-based algorithms (e.g., k-nearest neighbors,
support vector machines).
Common techniques for feature scaling include:
Min-Max Scaling: Scaling features to a fixed range, usually [0, 1].
Standardization (Z-score normalization): Scaling features so that they have
the properties of a standard normal distribution with a mean of 0 and a
standard deviation of 1.
Robust Scaling: Scaling features using statistics that are robust to outliers,
such as the median and interquartile range.
25