際際滷

際際滷Share a Scribd company logo
Data Analysis and
Visualisation with Python
Lewys Brace
l.brace@Exeter.ac.uk
Q-Step Workshop  06/11/2019
Numerical Python (NumPy)
 NumPy is the most foundational package for numerical computing in Python.
 If you are going to work on data analysis or machine learning projects, then
having a solid understanding of NumPy is nearly mandatory.
 Indeed, many other libraries, such as pandas and scikit-learn, use NumPys array
objects as the lingua franca for data exchange.
 One of the reasons as to why NumPy is so important for numerical computations
is because it is designed for efficiency with large arrays of data. The reasons for
this include:
- It stores data internally in a continuous block of memory, independent
of other in-built Python objects.
- It performs complex computations on entire arrays without the need
for for loops.
What youll find in NumPy
 ndarray: an efficient multidimensional array providing fast array-orientated
arithmetic operations and flexible broadcasting capabilities.
 Mathematical functions for fast operations on entire arrays of data without
having to write loops.
 Tools for reading/writing array data to disk and working with memory-
mapped files.
 Linear algebra, random number generation, and Fourier transform
capabilities.
 A C API for connecting NumPy with libraries written in C, C++, and FORTRAN.
This is why Python is the language of choice for wrapping legacy codebases.
The NumPy ndarray: A multi-dimensional
array object
 The NumPy ndarray object is a fast and flexible container for large
data sets in Python.
 NumPy arrays are a bit like Python lists, but are still a very different
beast at the same time.
 Arrays enable you to store multiple items of the same data type. It is
the facilities around the array object that makes NumPy so convenient
for performing math and data manipulations.
Ndarray vs. lists
 By now, you are familiar with Python lists and how incredibly useful
they are.
 So, you may be asking yourself:
I can store numbers and other objects in a Python list and do all sorts
of computations and manipulations through list comprehensions, for-
loops etc. What do I need a NumPy array for?
 There are very significant advantages of using NumPy arrays overs
lists.
Creating a NumPy array
 To understand these advantages, lets create an array.
 One of the most common, of the many, ways to create a NumPy array
is to create one from a list by passing it to the np.array() function.
In: Out:
Differences between lists and ndarrays
 The key difference between an array and a list is that arrays are
designed to handle vectorised operations while a python lists are not.
 That means, if you apply a function, it is performed on every item in
the array, rather than on the whole array object.
 Lets suppose you want to add the number 2 to every item in the list.
The intuitive way to do this is something like this:
 That was not possible with a list, but you can do that on an array:
In: Out:
In: Out:
 It should be noted here that, once a Numpy array is created, you
cannot increase its size.
 To do so, you will have to create a new array.
Create a 2d array from a list of list
 You can pass a list of lists to create a matrix-like a 2d array.
In:
Out:
The dtype argument
 You can specify the data-type by setting the dtype() argument.
 Some of the most commonly used NumPy dtypes are: float, int, bool, str,
and object.
In:
Out:
The astype argument
 You can also convert it to a different data-type using the astype method.
In: Out:
 Remember that, unlike lists, all items in an array have to be of the same
type.
dtype=object
 However, if you are uncertain about what data type your array will
hold, or if you want to hold characters and numbers in the same array,
you can set the dtype as 'object'.
In: Out:
The tolist() function
 You can always convert an array into a list using the tolist() command.
In: Out:
Inspecting a NumPy array
 There are a range of functions built into NumPy that allow you to
inspect different aspects of an array:
In:
Out:
Extracting specific items from an array
 You can extract portions of the array using indices, much like when
youre working with lists.
 Unlike lists, however, arrays can optionally accept as many parameters
in the square brackets as there are number of dimensions
In: Out:
Boolean indexing
 A boolean index array is of the same shape as the array-to-be-filtered,
but it only contains TRUE and FALSE values.
In: Out:
Pandas
 Pandas, like NumPy, is one of the most popular Python libraries for
data analysis.
 It is a high-level abstraction over low-level NumPy, which is written in
pure C.
 Pandas provides high-performance, easy-to-use data structures and
data analysis tools.
 There are two main structures used by pandas; data frames and
series.
Indices in a pandas series
 A pandas series is similar to a list, but differs in the fact that a series associates a label with
each element. This makes it look like a dictionary.
 If an index is not explicitly provided by the user, pandas creates a RangeIndex ranging from 0
to N-1.
 Each series object also has a data type.
In: Out
:
 As you may suspect by this point, a series has ways to extract all of
the values in the series, as well as individual elements by index.
In: Out
:
 You can also provide an index manually.
In:
Out:
 It is easy to retrieve several elements of a series by their indices or
make group assignments.
In:
Out:
Filtering and maths operations
 Filtering and maths operations are easy with Pandas as well.
In: Out
:
Pandas data frame
 Simplistically, a data frame is a table, with rows and columns.
 Each column in a data frame is a series object.
 Rows consist of elements inside series.
Case ID Variable one Variable two Variable 3
1 123 ABC 10
2 456 DEF 20
3 789 XYZ 30
Creating a Pandas data frame
 Pandas data frames can be constructed using Python dictionaries.
In:
Out:
 You can also create a data frame from a list.
In: Out:
 You can ascertain the type of a column with the type() function.
In:
Out:
 A Pandas data frame object as two indices; a column index and row
index.
 Again, if you do not provide one, Pandas will create a RangeIndex from 0
to N-1.
In:
Out:
 There are numerous ways to provide row indices explicitly.
 For example, you could provide an index when creating a data frame:
In: Out:
 or do it during runtime.
 Here, I also named the index country code.
In:
Out:
 Row access using index can be performed in several ways.
 First, you could use .loc() and provide an index label.
 Second, you could use .iloc() and provide an index number
In: Out:
In: Out:
 A selection of particular rows and columns can be selected this way.
In: Out:
 You can feed .loc() two arguments, index list and column list, slicing operation
is supported as well:
In: Out:
Filtering
 Filtering is performed using so-called Boolean arrays.
Deleting columns
 You can delete a column using the drop() function.
In: Out:
In: Out:
Reading from and writing to a file
 Pandas supports many popular file formats including CSV, XML, HTML,
Excel, SQL, JSON, etc.
 Out of all of these, CSV is the file format that you will work with the
most.
 You can read in the data from a CSV file using the read_csv() function.
 Similarly, you can write a data frame to a csv file with the to_csv()
function.
 Pandas has the capacity to do much more than what we have covered
here, such as grouping data and even data visualisation.
 However, as with NumPy, we dont have enough time to cover every
aspect of pandas here.
Exploratory data analysis (EDA)
Exploring your data is a crucial step in data analysis. It involves:
 Organising the data set
 Plotting aspects of the data set
 Maybe producing some numerical summaries; central tendency and
spread, etc.
Exploratory data analysis can never be the whole story, but nothing
else can serve as the foundation stone.
- John Tukey.
Download the data
 Download the Pokemon dataset from:
https://github.com/LewBrace/da_and_vis_python
 Unzip the folder, and save the data file in a location youll remember.
Reading in the data
 First we import the Python packages we are going to use.
 Then we use Pandas to load in the dataset as a data frame.
NOTE: The argument index_col argument states that we'll treat the first column
of the dataset as the ID column.
NOTE: The encoding argument allows us to by pass an input error created
by special characters in the data set.
Examine the data set
 We could spend time staring at these
numbers, but that is unlikely to offer
us any form of insight.
 We could begin by conducting all of
our statistical tests.
 However, a good field commander
never goes into battle without first
doing a recognisance of the terrain
 This is exactly what EDA is for
Plotting a histogram in Python
Bins
 You may have noticed the two histograms weve seen so far look different,
despite using the exact same data.
 This is because they have different bin values.
 The left graph used the default bins generated by plt.hist(), while the one on the
right used bins that I specified.
 There are a couple of ways to manipulate bins in matplotlib.
 Here, I specified where the edges of the bars of the histogram are; the
bin edges.
 You could also specify the number of bins, and Matplotlib will automatically
generate a number of evenly spaced bins.
Seaborn
 Matplotlib is a powerful, but sometimes unwieldy, Python library.
 Seaborn provides a high-level interface to Matplotlib and makes it easier
to produce graphs like the one on the right.
 Some IDEs incorporate elements of this under the hood nowadays.
Benefits of Seaborn
 Seaborn offers:
- Using default themes that are aesthetically pleasing.
- Setting custom colour palettes.
- Making attractive statistical plots.
- Easily and flexibly displaying distributions.
- Visualising information from matrices and DataFrames.
 The last three points have led to Seaborn becoming the exploratory
data analysis tool of choice for many Python users.
Plotting with Seaborn
 One of Seaborn's greatest strengths is its diversity of plotting
functions.
 Most plots can be created with one line of code.
 For example.
Histograms
 Allow you to plot the distributions of numeric variables.
Other types of graphs: Creating a scatter plot
Seaborn linear
model plot
function for
creating a scatter
graph
Name of variable we
want on the y-axis
Name of variable we
want on the x-axis
Name of our
dataframe fed to the
data= command
 Seaborn doesn't have a dedicated scatter plot function.
 We used Seaborn's function for fitting and plotting a regression line;
hence lmplot()
 However, Seaborn makes it easy to alter plots.
 To remove the regression line, we use the fit_reg=False command
The hue function
 Another useful function in Seaborn is the hue function, which enables
us to use a variable to colour code our data points.
Factor plots
 Make it easy to separate plots by categorical classes.
Colour by stage.
Separate by stage.
Generate using a swarmplot.
Rotate axis on x-ticks by 45 degrees.
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
A box plot
 The total, stage, and legendary entries are not combat stats so we should remove them.
 Pandas makes this easy to do, we just create a new dataframe
 We just use Pandas .drop() function to create a dataframe that doesnt include the
variables we dont want.
Seaborns theme
 Seaborn has a number of themes you can use to alter the appearance
of plots.
 For example, we can use whitegrid to add grid lines to our boxplot.
Violin plots
 Violin plots are useful alternatives to box plots.
 They show the distribution of a variable through the thickness of the violin.
 Here, we visualise the distribution of attack by Pok辿mon's primary type:
 Dragon types tend to have higher Attack stats than Ghost types, but they also have greater
variance. But there is something not right here.
 The colours!
Seaborns colour palettes
 Seaborn allows us to easily set custom colour palettes by providing it
with an ordered list of colour hex values.
 We first create our colours list.
 Then we just use the palette= function and feed in our colours list.
 Because of the limited number of observations, we could also use a
swarm plot.
 Here, each data point is an observation, but data points are grouped
together by the variable listed on the x-axis.
Overlapping plots
 Both of these show similar information, so it might be useful to
overlap them.
Set size of print canvas.
Remove bars from inside the violins
Make bars black and slightly transparent
Give the graph a title
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Data wrangling with Pandas
 What if we wanted to create such a plot that included all of the other
stats as well?
 In our current dataframe, all of the variables are in different columns:
 If we want to visualise all stats, then well have to melt the
dataframe.
We use the .drop() function again to re-
create the dataframe without these three
variables.
The dataframe we want to melt.
The variables to keep, all others will be
melted.
A name for the new, melted, variable.
 All 6 of the stat columns have been "melted" into one, and
the new Stat column indicates the original stat (HP, Attack,
Defense, Sp. Attack, Sp. Defense, or Speed).
 It's hard to see here, but each pokemon now has 6 rows of
data; hende the melted_df has 6 times more rows of data.
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
 This graph could be made to look nicer with a few tweaks.
Enlarge the plot.
Separate points by hue.
Use our special Pokemon colour palette.
Adjust the y-axis.
Move the legend box outside of
the graph and place to the right of
it..
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Plotting all data: Empirical cumulative
distribution functions (ECDFs)
 An alternative way of visualising a
distribution of a variable in a large dataset
is to use an ECDF.
 Here we have an ECDF that shows the
percentages of different attack strengths of
pokemon.
 An x-value of an ECDF is the quantity you
are measuring; i.e. attacks strength.
 The y-value is the fraction of data points
that have a value smaller than the
corresponding x-value. For example
20% of Pokemon have an attack
level of 50 or less.
75% of Pokemon have an attack
level of 90 or less
Plotting an ECDF
 You can also plot multiple ECDFs
on the same plot.
 As an example, here with have an
ECDF for Pokemon attack, speed,
and defence levels.
 We can see here that defence
levels tend to be a little less than
the other two.
The usefulness of ECDFs
 It is often quite useful to plot the ECDF first as part of your workflow.
 It shows all the data and gives a complete picture as to how the data
are distributed.
Heatmaps
 Useful for visualising matrix-like data.
 Here, well plot the correlation of the stats_df variables
Bar plot
 Visualises the distributions of categorical variables.
Rotates the x-ticks 45 degrees
Joint Distribution Plot
 Joint distribution plots combine information from scatter plots and
histograms to give you detailed information for bi-variate distributions.
Any questions?

More Related Content

Similar to Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx (20)

Kaggle tokyo 2018
Kaggle tokyo 2018Kaggle tokyo 2018
Kaggle tokyo 2018
Cournapeau David
Python-Basics.pptx
Python-Basics.pptxPython-Basics.pptx
Python-Basics.pptx
TamalSengupta8
STACK.pptx
STACK.pptxSTACK.pptx
STACK.pptx
Dr.Shweta
Engineering CS 5th Sem Python Module -2.pptx
Engineering CS 5th Sem Python Module -2.pptxEngineering CS 5th Sem Python Module -2.pptx
Engineering CS 5th Sem Python Module -2.pptx
hardii0991
data science with python_UNIT 2_full notes.pdf
data science with python_UNIT 2_full notes.pdfdata science with python_UNIT 2_full notes.pdf
data science with python_UNIT 2_full notes.pdf
mukeshgarg02
stack.pptx
stack.pptxstack.pptx
stack.pptx
mayankKatiyar17
Session 2
Session 2Session 2
Session 2
HarithaAshok3
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptxII B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
sabithabanu83
Introduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICSIntroduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICS
HaritikaChhatwal1
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptx
RohithK65
Pa2 session 1
Pa2 session 1Pa2 session 1
Pa2 session 1
aiclub_slides
python-numwpyandpandas-170922144956.pptx
python-numwpyandpandas-170922144956.pptxpython-numwpyandpandas-170922144956.pptx
python-numwpyandpandas-170922144956.pptx
smartashammari
(2) collections algorithms
(2) collections algorithms(2) collections algorithms
(2) collections algorithms
Nico Ludwig
Unit 2 linear data structures
Unit 2   linear data structuresUnit 2   linear data structures
Unit 2 linear data structures
Senthil Murugan
Programming with Python - Week 3
Programming with Python - Week 3Programming with Python - Week 3
Programming with Python - Week 3
Ahmet Bulut
Introduction to Python Programming for beginners
Introduction to Python Programming for beginnersIntroduction to Python Programming for beginners
Introduction to Python Programming for beginners
MuhammadUsman406079
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
HendraPurnama31
Lesson 7-computer programming case study-FINAL.pptx
Lesson 7-computer programming case study-FINAL.pptxLesson 7-computer programming case study-FINAL.pptx
Lesson 7-computer programming case study-FINAL.pptx
claritoBaluyot2
Numpy.pdf
Numpy.pdfNumpy.pdf
Numpy.pdf
Arvind Pathak
Advance Programming 際際滷s lect.pptx.pdf
Advance Programming 際際滷s lect.pptx.pdfAdvance Programming 際際滷s lect.pptx.pdf
Advance Programming 際際滷s lect.pptx.pdf
mohsinfareed780
STACK.pptx
STACK.pptxSTACK.pptx
STACK.pptx
Dr.Shweta
Engineering CS 5th Sem Python Module -2.pptx
Engineering CS 5th Sem Python Module -2.pptxEngineering CS 5th Sem Python Module -2.pptx
Engineering CS 5th Sem Python Module -2.pptx
hardii0991
data science with python_UNIT 2_full notes.pdf
data science with python_UNIT 2_full notes.pdfdata science with python_UNIT 2_full notes.pdf
data science with python_UNIT 2_full notes.pdf
mukeshgarg02
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptxII B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
sabithabanu83
Introduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICSIntroduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICS
HaritikaChhatwal1
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptx
RohithK65
python-numwpyandpandas-170922144956.pptx
python-numwpyandpandas-170922144956.pptxpython-numwpyandpandas-170922144956.pptx
python-numwpyandpandas-170922144956.pptx
smartashammari
(2) collections algorithms
(2) collections algorithms(2) collections algorithms
(2) collections algorithms
Nico Ludwig
Unit 2 linear data structures
Unit 2   linear data structuresUnit 2   linear data structures
Unit 2 linear data structures
Senthil Murugan
Programming with Python - Week 3
Programming with Python - Week 3Programming with Python - Week 3
Programming with Python - Week 3
Ahmet Bulut
Introduction to Python Programming for beginners
Introduction to Python Programming for beginnersIntroduction to Python Programming for beginners
Introduction to Python Programming for beginners
MuhammadUsman406079
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
HendraPurnama31
Lesson 7-computer programming case study-FINAL.pptx
Lesson 7-computer programming case study-FINAL.pptxLesson 7-computer programming case study-FINAL.pptx
Lesson 7-computer programming case study-FINAL.pptx
claritoBaluyot2
Advance Programming 際際滷s lect.pptx.pdf
Advance Programming 際際滷s lect.pptx.pdfAdvance Programming 際際滷s lect.pptx.pdf
Advance Programming 際際滷s lect.pptx.pdf
mohsinfareed780

More from Ogunsina1 (20)

Introduction to Biochemistry lecture notes
Introduction to Biochemistry lecture notesIntroduction to Biochemistry lecture notes
Introduction to Biochemistry lecture notes
Ogunsina1
lecture note on Bioenergetics for 300 level
lecture note on Bioenergetics for 300 levellecture note on Bioenergetics for 300 level
lecture note on Bioenergetics for 300 level
Ogunsina1
Introduction to Biochemistry lecture notes
Introduction to Biochemistry lecture notesIntroduction to Biochemistry lecture notes
Introduction to Biochemistry lecture notes
Ogunsina1
Lecture note on Bioenergetics for 300 level
Lecture note on Bioenergetics for 300 levelLecture note on Bioenergetics for 300 level
Lecture note on Bioenergetics for 300 level
Ogunsina1
Presentation on data analysis lecture 1.pptx
Presentation on data analysis lecture 1.pptxPresentation on data analysis lecture 1.pptx
Presentation on data analysis lecture 1.pptx
Ogunsina1
bppcp-150309162202-conversion-gate01.pptx
bppcp-150309162202-conversion-gate01.pptxbppcp-150309162202-conversion-gate01.pptx
bppcp-150309162202-conversion-gate01.pptx
Ogunsina1
TERATOGENICITY seminar lecture note pdf.pdf
TERATOGENICITY seminar lecture note pdf.pdfTERATOGENICITY seminar lecture note pdf.pdf
TERATOGENICITY seminar lecture note pdf.pdf
Ogunsina1
Pharmacy Ethics and Principles lectures.pptx
Pharmacy Ethics and Principles lectures.pptxPharmacy Ethics and Principles lectures.pptx
Pharmacy Ethics and Principles lectures.pptx
Ogunsina1
NUTRITION PHARMCISTS GROUP AUGUST 15TH.pptx
NUTRITION PHARMCISTS GROUP AUGUST 15TH.pptxNUTRITION PHARMCISTS GROUP AUGUST 15TH.pptx
NUTRITION PHARMCISTS GROUP AUGUST 15TH.pptx
Ogunsina1
lecture on MANAGEMENT OF HIV-AIDS_080953.pptx
lecture on MANAGEMENT OF HIV-AIDS_080953.pptxlecture on MANAGEMENT OF HIV-AIDS_080953.pptx
lecture on MANAGEMENT OF HIV-AIDS_080953.pptx
Ogunsina1
SEMINAR/LECTURE NOTE ON MY PRESENTATION.pptx
SEMINAR/LECTURE  NOTE ON MY PRESENTATION.pptxSEMINAR/LECTURE  NOTE ON MY PRESENTATION.pptx
SEMINAR/LECTURE NOTE ON MY PRESENTATION.pptx
Ogunsina1
Lecture note 4_2019_02_19!06_50_22_PM.pptx
Lecture note 4_2019_02_19!06_50_22_PM.pptxLecture note 4_2019_02_19!06_50_22_PM.pptx
Lecture note 4_2019_02_19!06_50_22_PM.pptx
Ogunsina1
peptic ulcer management and Treatment ppt
peptic ulcer management and Treatment pptpeptic ulcer management and Treatment ppt
peptic ulcer management and Treatment ppt
Ogunsina1
lecture note on Peptic ulcer pathophysiology .ppt
lecture note on Peptic ulcer pathophysiology .pptlecture note on Peptic ulcer pathophysiology .ppt
lecture note on Peptic ulcer pathophysiology .ppt
Ogunsina1
Lecture/ seminar on MUCLecture_2021_101328341.pptx
Lecture/ seminar on MUCLecture_2021_101328341.pptxLecture/ seminar on MUCLecture_2021_101328341.pptx
Lecture/ seminar on MUCLecture_2021_101328341.pptx
Ogunsina1
Lecture note on GERD and peptic ulcer.ppt
Lecture note on GERD and peptic ulcer.pptLecture note on GERD and peptic ulcer.ppt
Lecture note on GERD and peptic ulcer.ppt
Ogunsina1
seminar lecture storage conditions-1.pptx
seminar lecture storage conditions-1.pptxseminar lecture storage conditions-1.pptx
seminar lecture storage conditions-1.pptx
Ogunsina1
lecture note lipidsmetabolism-160606064130.pptx
lecture note lipidsmetabolism-160606064130.pptxlecture note lipidsmetabolism-160606064130.pptx
lecture note lipidsmetabolism-160606064130.pptx
Ogunsina1
lecture note Carbonhydrate Metabolism in Plant.ppt
lecture note Carbonhydrate Metabolism in Plant.pptlecture note Carbonhydrate Metabolism in Plant.ppt
lecture note Carbonhydrate Metabolism in Plant.ppt
Ogunsina1
lecture note Biochemistry of Carotenoids.pptx
lecture note Biochemistry of Carotenoids.pptxlecture note Biochemistry of Carotenoids.pptx
lecture note Biochemistry of Carotenoids.pptx
Ogunsina1
Introduction to Biochemistry lecture notes
Introduction to Biochemistry lecture notesIntroduction to Biochemistry lecture notes
Introduction to Biochemistry lecture notes
Ogunsina1
lecture note on Bioenergetics for 300 level
lecture note on Bioenergetics for 300 levellecture note on Bioenergetics for 300 level
lecture note on Bioenergetics for 300 level
Ogunsina1
Introduction to Biochemistry lecture notes
Introduction to Biochemistry lecture notesIntroduction to Biochemistry lecture notes
Introduction to Biochemistry lecture notes
Ogunsina1
Lecture note on Bioenergetics for 300 level
Lecture note on Bioenergetics for 300 levelLecture note on Bioenergetics for 300 level
Lecture note on Bioenergetics for 300 level
Ogunsina1
Presentation on data analysis lecture 1.pptx
Presentation on data analysis lecture 1.pptxPresentation on data analysis lecture 1.pptx
Presentation on data analysis lecture 1.pptx
Ogunsina1
bppcp-150309162202-conversion-gate01.pptx
bppcp-150309162202-conversion-gate01.pptxbppcp-150309162202-conversion-gate01.pptx
bppcp-150309162202-conversion-gate01.pptx
Ogunsina1
TERATOGENICITY seminar lecture note pdf.pdf
TERATOGENICITY seminar lecture note pdf.pdfTERATOGENICITY seminar lecture note pdf.pdf
TERATOGENICITY seminar lecture note pdf.pdf
Ogunsina1
Pharmacy Ethics and Principles lectures.pptx
Pharmacy Ethics and Principles lectures.pptxPharmacy Ethics and Principles lectures.pptx
Pharmacy Ethics and Principles lectures.pptx
Ogunsina1
NUTRITION PHARMCISTS GROUP AUGUST 15TH.pptx
NUTRITION PHARMCISTS GROUP AUGUST 15TH.pptxNUTRITION PHARMCISTS GROUP AUGUST 15TH.pptx
NUTRITION PHARMCISTS GROUP AUGUST 15TH.pptx
Ogunsina1
lecture on MANAGEMENT OF HIV-AIDS_080953.pptx
lecture on MANAGEMENT OF HIV-AIDS_080953.pptxlecture on MANAGEMENT OF HIV-AIDS_080953.pptx
lecture on MANAGEMENT OF HIV-AIDS_080953.pptx
Ogunsina1
SEMINAR/LECTURE NOTE ON MY PRESENTATION.pptx
SEMINAR/LECTURE  NOTE ON MY PRESENTATION.pptxSEMINAR/LECTURE  NOTE ON MY PRESENTATION.pptx
SEMINAR/LECTURE NOTE ON MY PRESENTATION.pptx
Ogunsina1
Lecture note 4_2019_02_19!06_50_22_PM.pptx
Lecture note 4_2019_02_19!06_50_22_PM.pptxLecture note 4_2019_02_19!06_50_22_PM.pptx
Lecture note 4_2019_02_19!06_50_22_PM.pptx
Ogunsina1
peptic ulcer management and Treatment ppt
peptic ulcer management and Treatment pptpeptic ulcer management and Treatment ppt
peptic ulcer management and Treatment ppt
Ogunsina1
lecture note on Peptic ulcer pathophysiology .ppt
lecture note on Peptic ulcer pathophysiology .pptlecture note on Peptic ulcer pathophysiology .ppt
lecture note on Peptic ulcer pathophysiology .ppt
Ogunsina1
Lecture/ seminar on MUCLecture_2021_101328341.pptx
Lecture/ seminar on MUCLecture_2021_101328341.pptxLecture/ seminar on MUCLecture_2021_101328341.pptx
Lecture/ seminar on MUCLecture_2021_101328341.pptx
Ogunsina1
Lecture note on GERD and peptic ulcer.ppt
Lecture note on GERD and peptic ulcer.pptLecture note on GERD and peptic ulcer.ppt
Lecture note on GERD and peptic ulcer.ppt
Ogunsina1
seminar lecture storage conditions-1.pptx
seminar lecture storage conditions-1.pptxseminar lecture storage conditions-1.pptx
seminar lecture storage conditions-1.pptx
Ogunsina1
lecture note lipidsmetabolism-160606064130.pptx
lecture note lipidsmetabolism-160606064130.pptxlecture note lipidsmetabolism-160606064130.pptx
lecture note lipidsmetabolism-160606064130.pptx
Ogunsina1
lecture note Carbonhydrate Metabolism in Plant.ppt
lecture note Carbonhydrate Metabolism in Plant.pptlecture note Carbonhydrate Metabolism in Plant.ppt
lecture note Carbonhydrate Metabolism in Plant.ppt
Ogunsina1
lecture note Biochemistry of Carotenoids.pptx
lecture note Biochemistry of Carotenoids.pptxlecture note Biochemistry of Carotenoids.pptx
lecture note Biochemistry of Carotenoids.pptx
Ogunsina1

Recently uploaded (20)

Enhancing SoTL through Generative AI -- Opportunities and Ethical Considerati...
Enhancing SoTL through Generative AI -- Opportunities and Ethical Considerati...Enhancing SoTL through Generative AI -- Opportunities and Ethical Considerati...
Enhancing SoTL through Generative AI -- Opportunities and Ethical Considerati...
Sue Beckingham
Unit 3: Combustion in Spark Ignition Engines
Unit 3: Combustion in Spark Ignition EnginesUnit 3: Combustion in Spark Ignition Engines
Unit 3: Combustion in Spark Ignition Engines
NileshKumbhar21
Different perspectives on dugout canoe heritage of Soomaa.pdf
Different perspectives on dugout canoe heritage of Soomaa.pdfDifferent perspectives on dugout canoe heritage of Soomaa.pdf
Different perspectives on dugout canoe heritage of Soomaa.pdf
Aivar Ruukel
Marketing is Everything in the Beauty Business! 憓 Talent gets you in the ...
 Marketing is Everything in the Beauty Business! 憓 Talent gets you in the ... Marketing is Everything in the Beauty Business! 憓 Talent gets you in the ...
Marketing is Everything in the Beauty Business! 憓 Talent gets you in the ...
coreylewis960
Knownsense 2025 prelims- U-25 General Quiz.pdf
Knownsense 2025 prelims- U-25 General Quiz.pdfKnownsense 2025 prelims- U-25 General Quiz.pdf
Knownsense 2025 prelims- U-25 General Quiz.pdf
Pragya - UEM Kolkata Quiz Club
Quizzitch Cup_Sports Quiz 2025_Prelims.pptx
Quizzitch Cup_Sports Quiz 2025_Prelims.pptxQuizzitch Cup_Sports Quiz 2025_Prelims.pptx
Quizzitch Cup_Sports Quiz 2025_Prelims.pptx
Anand Kumar
NURSING PROCESS AND ITS STEPS .pptx
NURSING PROCESS AND ITS STEPS                 .pptxNURSING PROCESS AND ITS STEPS                 .pptx
NURSING PROCESS AND ITS STEPS .pptx
PoojaSen20
Design approaches and ethical challenges in Artificial Intelligence tools for...
Design approaches and ethical challenges in Artificial Intelligence tools for...Design approaches and ethical challenges in Artificial Intelligence tools for...
Design approaches and ethical challenges in Artificial Intelligence tools for...
Yannis
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VIAnti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Samruddhi Khonde
Berry_Kanisha_BAS_PB1_202503 (2) (2).pdf
Berry_Kanisha_BAS_PB1_202503 (2) (2).pdfBerry_Kanisha_BAS_PB1_202503 (2) (2).pdf
Berry_Kanisha_BAS_PB1_202503 (2) (2).pdf
KanishaBerry
ANTIVIRAL agent by Mrs. Manjushri Dabhade
ANTIVIRAL agent by Mrs. Manjushri DabhadeANTIVIRAL agent by Mrs. Manjushri Dabhade
ANTIVIRAL agent by Mrs. Manjushri Dabhade
Dabhade madam Dabhade
General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...
General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...
General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...
Amlan Sarkar
Knownsense 2025 Finals-U-25 General Quiz.pdf
Knownsense 2025 Finals-U-25 General Quiz.pdfKnownsense 2025 Finals-U-25 General Quiz.pdf
Knownsense 2025 Finals-U-25 General Quiz.pdf
Pragya - UEM Kolkata Quiz Club
Unit1 Inroduction to Internal Combustion Engines
Unit1  Inroduction to Internal Combustion EnginesUnit1  Inroduction to Internal Combustion Engines
Unit1 Inroduction to Internal Combustion Engines
NileshKumbhar21
MIPLM subject matter expert Nicos Raftis
MIPLM subject matter expert Nicos RaftisMIPLM subject matter expert Nicos Raftis
MIPLM subject matter expert Nicos Raftis
MIPLM
Introduction to Systematic Reviews - Prof Ejaz Khan
Introduction to Systematic Reviews - Prof Ejaz KhanIntroduction to Systematic Reviews - Prof Ejaz Khan
Introduction to Systematic Reviews - Prof Ejaz Khan
Systematic Reviews Network (SRN)
The basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptxThe basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptx
heathfieldcps1
MIPLM subject matter expert Daniel Holzner
MIPLM subject matter expert Daniel HolznerMIPLM subject matter expert Daniel Holzner
MIPLM subject matter expert Daniel Holzner
MIPLM
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
heathfieldcps1
Enhancing SoTL through Generative AI -- Opportunities and Ethical Considerati...
Enhancing SoTL through Generative AI -- Opportunities and Ethical Considerati...Enhancing SoTL through Generative AI -- Opportunities and Ethical Considerati...
Enhancing SoTL through Generative AI -- Opportunities and Ethical Considerati...
Sue Beckingham
Unit 3: Combustion in Spark Ignition Engines
Unit 3: Combustion in Spark Ignition EnginesUnit 3: Combustion in Spark Ignition Engines
Unit 3: Combustion in Spark Ignition Engines
NileshKumbhar21
Different perspectives on dugout canoe heritage of Soomaa.pdf
Different perspectives on dugout canoe heritage of Soomaa.pdfDifferent perspectives on dugout canoe heritage of Soomaa.pdf
Different perspectives on dugout canoe heritage of Soomaa.pdf
Aivar Ruukel
Marketing is Everything in the Beauty Business! 憓 Talent gets you in the ...
 Marketing is Everything in the Beauty Business! 憓 Talent gets you in the ... Marketing is Everything in the Beauty Business! 憓 Talent gets you in the ...
Marketing is Everything in the Beauty Business! 憓 Talent gets you in the ...
coreylewis960
Quizzitch Cup_Sports Quiz 2025_Prelims.pptx
Quizzitch Cup_Sports Quiz 2025_Prelims.pptxQuizzitch Cup_Sports Quiz 2025_Prelims.pptx
Quizzitch Cup_Sports Quiz 2025_Prelims.pptx
Anand Kumar
NURSING PROCESS AND ITS STEPS .pptx
NURSING PROCESS AND ITS STEPS                 .pptxNURSING PROCESS AND ITS STEPS                 .pptx
NURSING PROCESS AND ITS STEPS .pptx
PoojaSen20
Design approaches and ethical challenges in Artificial Intelligence tools for...
Design approaches and ethical challenges in Artificial Intelligence tools for...Design approaches and ethical challenges in Artificial Intelligence tools for...
Design approaches and ethical challenges in Artificial Intelligence tools for...
Yannis
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VIAnti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Samruddhi Khonde
Berry_Kanisha_BAS_PB1_202503 (2) (2).pdf
Berry_Kanisha_BAS_PB1_202503 (2) (2).pdfBerry_Kanisha_BAS_PB1_202503 (2) (2).pdf
Berry_Kanisha_BAS_PB1_202503 (2) (2).pdf
KanishaBerry
ANTIVIRAL agent by Mrs. Manjushri Dabhade
ANTIVIRAL agent by Mrs. Manjushri DabhadeANTIVIRAL agent by Mrs. Manjushri Dabhade
ANTIVIRAL agent by Mrs. Manjushri Dabhade
Dabhade madam Dabhade
General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...
General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...
General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...
Amlan Sarkar
Unit1 Inroduction to Internal Combustion Engines
Unit1  Inroduction to Internal Combustion EnginesUnit1  Inroduction to Internal Combustion Engines
Unit1 Inroduction to Internal Combustion Engines
NileshKumbhar21
MIPLM subject matter expert Nicos Raftis
MIPLM subject matter expert Nicos RaftisMIPLM subject matter expert Nicos Raftis
MIPLM subject matter expert Nicos Raftis
MIPLM
The basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptxThe basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptx
heathfieldcps1
MIPLM subject matter expert Daniel Holzner
MIPLM subject matter expert Daniel HolznerMIPLM subject matter expert Daniel Holzner
MIPLM subject matter expert Daniel Holzner
MIPLM
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
heathfieldcps1

Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx

  • 1. Data Analysis and Visualisation with Python Lewys Brace l.brace@Exeter.ac.uk Q-Step Workshop 06/11/2019
  • 2. Numerical Python (NumPy) NumPy is the most foundational package for numerical computing in Python. If you are going to work on data analysis or machine learning projects, then having a solid understanding of NumPy is nearly mandatory. Indeed, many other libraries, such as pandas and scikit-learn, use NumPys array objects as the lingua franca for data exchange. One of the reasons as to why NumPy is so important for numerical computations is because it is designed for efficiency with large arrays of data. The reasons for this include: - It stores data internally in a continuous block of memory, independent of other in-built Python objects. - It performs complex computations on entire arrays without the need for for loops.
  • 3. What youll find in NumPy ndarray: an efficient multidimensional array providing fast array-orientated arithmetic operations and flexible broadcasting capabilities. Mathematical functions for fast operations on entire arrays of data without having to write loops. Tools for reading/writing array data to disk and working with memory- mapped files. Linear algebra, random number generation, and Fourier transform capabilities. A C API for connecting NumPy with libraries written in C, C++, and FORTRAN. This is why Python is the language of choice for wrapping legacy codebases.
  • 4. The NumPy ndarray: A multi-dimensional array object The NumPy ndarray object is a fast and flexible container for large data sets in Python. NumPy arrays are a bit like Python lists, but are still a very different beast at the same time. Arrays enable you to store multiple items of the same data type. It is the facilities around the array object that makes NumPy so convenient for performing math and data manipulations.
  • 5. Ndarray vs. lists By now, you are familiar with Python lists and how incredibly useful they are. So, you may be asking yourself: I can store numbers and other objects in a Python list and do all sorts of computations and manipulations through list comprehensions, for- loops etc. What do I need a NumPy array for? There are very significant advantages of using NumPy arrays overs lists.
  • 6. Creating a NumPy array To understand these advantages, lets create an array. One of the most common, of the many, ways to create a NumPy array is to create one from a list by passing it to the np.array() function. In: Out:
  • 7. Differences between lists and ndarrays The key difference between an array and a list is that arrays are designed to handle vectorised operations while a python lists are not. That means, if you apply a function, it is performed on every item in the array, rather than on the whole array object.
  • 8. Lets suppose you want to add the number 2 to every item in the list. The intuitive way to do this is something like this: That was not possible with a list, but you can do that on an array: In: Out: In: Out:
  • 9. It should be noted here that, once a Numpy array is created, you cannot increase its size. To do so, you will have to create a new array.
  • 10. Create a 2d array from a list of list You can pass a list of lists to create a matrix-like a 2d array. In: Out:
  • 11. The dtype argument You can specify the data-type by setting the dtype() argument. Some of the most commonly used NumPy dtypes are: float, int, bool, str, and object. In: Out:
  • 12. The astype argument You can also convert it to a different data-type using the astype method. In: Out: Remember that, unlike lists, all items in an array have to be of the same type.
  • 13. dtype=object However, if you are uncertain about what data type your array will hold, or if you want to hold characters and numbers in the same array, you can set the dtype as 'object'. In: Out:
  • 14. The tolist() function You can always convert an array into a list using the tolist() command. In: Out:
  • 15. Inspecting a NumPy array There are a range of functions built into NumPy that allow you to inspect different aspects of an array: In: Out:
  • 16. Extracting specific items from an array You can extract portions of the array using indices, much like when youre working with lists. Unlike lists, however, arrays can optionally accept as many parameters in the square brackets as there are number of dimensions In: Out:
  • 17. Boolean indexing A boolean index array is of the same shape as the array-to-be-filtered, but it only contains TRUE and FALSE values. In: Out:
  • 18. Pandas Pandas, like NumPy, is one of the most popular Python libraries for data analysis. It is a high-level abstraction over low-level NumPy, which is written in pure C. Pandas provides high-performance, easy-to-use data structures and data analysis tools. There are two main structures used by pandas; data frames and series.
  • 19. Indices in a pandas series A pandas series is similar to a list, but differs in the fact that a series associates a label with each element. This makes it look like a dictionary. If an index is not explicitly provided by the user, pandas creates a RangeIndex ranging from 0 to N-1. Each series object also has a data type. In: Out :
  • 20. As you may suspect by this point, a series has ways to extract all of the values in the series, as well as individual elements by index. In: Out : You can also provide an index manually. In: Out:
  • 21. It is easy to retrieve several elements of a series by their indices or make group assignments. In: Out:
  • 22. Filtering and maths operations Filtering and maths operations are easy with Pandas as well. In: Out :
  • 23. Pandas data frame Simplistically, a data frame is a table, with rows and columns. Each column in a data frame is a series object. Rows consist of elements inside series. Case ID Variable one Variable two Variable 3 1 123 ABC 10 2 456 DEF 20 3 789 XYZ 30
  • 24. Creating a Pandas data frame Pandas data frames can be constructed using Python dictionaries. In: Out:
  • 25. You can also create a data frame from a list. In: Out:
  • 26. You can ascertain the type of a column with the type() function. In: Out:
  • 27. A Pandas data frame object as two indices; a column index and row index. Again, if you do not provide one, Pandas will create a RangeIndex from 0 to N-1. In: Out:
  • 28. There are numerous ways to provide row indices explicitly. For example, you could provide an index when creating a data frame: In: Out: or do it during runtime. Here, I also named the index country code. In: Out:
  • 29. Row access using index can be performed in several ways. First, you could use .loc() and provide an index label. Second, you could use .iloc() and provide an index number In: Out: In: Out:
  • 30. A selection of particular rows and columns can be selected this way. In: Out: You can feed .loc() two arguments, index list and column list, slicing operation is supported as well: In: Out:
  • 31. Filtering Filtering is performed using so-called Boolean arrays.
  • 32. Deleting columns You can delete a column using the drop() function. In: Out: In: Out:
  • 33. Reading from and writing to a file Pandas supports many popular file formats including CSV, XML, HTML, Excel, SQL, JSON, etc. Out of all of these, CSV is the file format that you will work with the most. You can read in the data from a CSV file using the read_csv() function. Similarly, you can write a data frame to a csv file with the to_csv() function.
  • 34. Pandas has the capacity to do much more than what we have covered here, such as grouping data and even data visualisation. However, as with NumPy, we dont have enough time to cover every aspect of pandas here.
  • 35. Exploratory data analysis (EDA) Exploring your data is a crucial step in data analysis. It involves: Organising the data set Plotting aspects of the data set Maybe producing some numerical summaries; central tendency and spread, etc. Exploratory data analysis can never be the whole story, but nothing else can serve as the foundation stone. - John Tukey.
  • 36. Download the data Download the Pokemon dataset from: https://github.com/LewBrace/da_and_vis_python Unzip the folder, and save the data file in a location youll remember.
  • 37. Reading in the data First we import the Python packages we are going to use. Then we use Pandas to load in the dataset as a data frame. NOTE: The argument index_col argument states that we'll treat the first column of the dataset as the ID column. NOTE: The encoding argument allows us to by pass an input error created by special characters in the data set.
  • 39. We could spend time staring at these numbers, but that is unlikely to offer us any form of insight. We could begin by conducting all of our statistical tests. However, a good field commander never goes into battle without first doing a recognisance of the terrain This is exactly what EDA is for
  • 40. Plotting a histogram in Python
  • 41. Bins You may have noticed the two histograms weve seen so far look different, despite using the exact same data. This is because they have different bin values. The left graph used the default bins generated by plt.hist(), while the one on the right used bins that I specified.
  • 42. There are a couple of ways to manipulate bins in matplotlib. Here, I specified where the edges of the bars of the histogram are; the bin edges.
  • 43. You could also specify the number of bins, and Matplotlib will automatically generate a number of evenly spaced bins.
  • 44. Seaborn Matplotlib is a powerful, but sometimes unwieldy, Python library. Seaborn provides a high-level interface to Matplotlib and makes it easier to produce graphs like the one on the right. Some IDEs incorporate elements of this under the hood nowadays.
  • 45. Benefits of Seaborn Seaborn offers: - Using default themes that are aesthetically pleasing. - Setting custom colour palettes. - Making attractive statistical plots. - Easily and flexibly displaying distributions. - Visualising information from matrices and DataFrames. The last three points have led to Seaborn becoming the exploratory data analysis tool of choice for many Python users.
  • 46. Plotting with Seaborn One of Seaborn's greatest strengths is its diversity of plotting functions. Most plots can be created with one line of code. For example.
  • 47. Histograms Allow you to plot the distributions of numeric variables.
  • 48. Other types of graphs: Creating a scatter plot Seaborn linear model plot function for creating a scatter graph Name of variable we want on the y-axis Name of variable we want on the x-axis Name of our dataframe fed to the data= command
  • 49. Seaborn doesn't have a dedicated scatter plot function. We used Seaborn's function for fitting and plotting a regression line; hence lmplot() However, Seaborn makes it easy to alter plots. To remove the regression line, we use the fit_reg=False command
  • 50. The hue function Another useful function in Seaborn is the hue function, which enables us to use a variable to colour code our data points.
  • 51. Factor plots Make it easy to separate plots by categorical classes. Colour by stage. Separate by stage. Generate using a swarmplot. Rotate axis on x-ticks by 45 degrees.
  • 54. The total, stage, and legendary entries are not combat stats so we should remove them. Pandas makes this easy to do, we just create a new dataframe We just use Pandas .drop() function to create a dataframe that doesnt include the variables we dont want.
  • 55. Seaborns theme Seaborn has a number of themes you can use to alter the appearance of plots. For example, we can use whitegrid to add grid lines to our boxplot.
  • 56. Violin plots Violin plots are useful alternatives to box plots. They show the distribution of a variable through the thickness of the violin. Here, we visualise the distribution of attack by Pok辿mon's primary type:
  • 57. Dragon types tend to have higher Attack stats than Ghost types, but they also have greater variance. But there is something not right here. The colours!
  • 58. Seaborns colour palettes Seaborn allows us to easily set custom colour palettes by providing it with an ordered list of colour hex values. We first create our colours list.
  • 59. Then we just use the palette= function and feed in our colours list.
  • 60. Because of the limited number of observations, we could also use a swarm plot. Here, each data point is an observation, but data points are grouped together by the variable listed on the x-axis.
  • 61. Overlapping plots Both of these show similar information, so it might be useful to overlap them. Set size of print canvas. Remove bars from inside the violins Make bars black and slightly transparent Give the graph a title
  • 63. Data wrangling with Pandas What if we wanted to create such a plot that included all of the other stats as well? In our current dataframe, all of the variables are in different columns:
  • 64. If we want to visualise all stats, then well have to melt the dataframe. We use the .drop() function again to re- create the dataframe without these three variables. The dataframe we want to melt. The variables to keep, all others will be melted. A name for the new, melted, variable. All 6 of the stat columns have been "melted" into one, and the new Stat column indicates the original stat (HP, Attack, Defense, Sp. Attack, Sp. Defense, or Speed). It's hard to see here, but each pokemon now has 6 rows of data; hende the melted_df has 6 times more rows of data.
  • 66. This graph could be made to look nicer with a few tweaks. Enlarge the plot. Separate points by hue. Use our special Pokemon colour palette. Adjust the y-axis. Move the legend box outside of the graph and place to the right of it..
  • 68. Plotting all data: Empirical cumulative distribution functions (ECDFs) An alternative way of visualising a distribution of a variable in a large dataset is to use an ECDF. Here we have an ECDF that shows the percentages of different attack strengths of pokemon. An x-value of an ECDF is the quantity you are measuring; i.e. attacks strength. The y-value is the fraction of data points that have a value smaller than the corresponding x-value. For example
  • 69. 20% of Pokemon have an attack level of 50 or less. 75% of Pokemon have an attack level of 90 or less
  • 71. You can also plot multiple ECDFs on the same plot. As an example, here with have an ECDF for Pokemon attack, speed, and defence levels. We can see here that defence levels tend to be a little less than the other two.
  • 72. The usefulness of ECDFs It is often quite useful to plot the ECDF first as part of your workflow. It shows all the data and gives a complete picture as to how the data are distributed.
  • 73. Heatmaps Useful for visualising matrix-like data. Here, well plot the correlation of the stats_df variables
  • 74. Bar plot Visualises the distributions of categorical variables. Rotates the x-ticks 45 degrees
  • 75. Joint Distribution Plot Joint distribution plots combine information from scatter plots and histograms to give you detailed information for bi-variate distributions.

Editor's Notes

  • #70: The y-axis is evenly spaced data points with a maximum of 1.