This document provides an overview of data analysis and visualization techniques using Python. It begins with an introduction to NumPy, the fundamental package for numerical computing in Python. NumPy stores data efficiently in arrays and allows for fast operations on entire arrays. The document then covers Pandas, which builds on NumPy and provides data structures like Series and DataFrames for working with structured and labeled data. It demonstrates how to load data, select subsets of data, and perform operations like filtering and aggregations. Finally, it discusses various data visualization techniques using Matplotlib and Seaborn like histograms, scatter plots, box plots, and heatmaps that can be used for exploratory data analysis to gain insights from data.
This document provides an agenda and overview for a Python tutorial presented over multiple sessions. The first session introduces Python and demonstrates how to use the Python interpreter. The second session covers basic Python data structures like lists, modules, input/output, and exceptions. An optional third session discusses unit testing. The document explains that Python is an easy to learn yet powerful programming language that supports object-oriented programming and high-level data structures in an interpreted, dynamic environment.
This document provides an introduction and overview of NumPy, the fundamental package for scientific computing in Python. It discusses what NumPy is, how to install it, import it, and read NumPy code examples. It also defines NumPy arrays, compares them to Python lists, and describes how to create basic arrays and manipulate elements by adding, removing, and sorting.
An array is a collection of memory locations that store elements of the same data type. Arrays have a fixed size and elements are accessed using an index. This document discusses array implementation using Python's array module. It describes how to create an array, access elements, insert/delete elements, search, update values, and traverse through an array. The key differences between arrays and lists in Python are that arrays have a fixed size and can only contain same-type elements, while lists can grow/shrink and hold mixed types.
This document provides an overview of NumPy, a fundamental Python library for numerical computing and data science. It discusses how NumPy enables fast and expressive array computing in Python, allowing operations on whole arrays to be performed efficiently at low-level speeds approaching that of languages like C. NumPy arrays store data in a single block of memory and use broadcasting rules to perform arithmetic on arrays with incompatible shapes. NumPy also supports multidimensional indexing and slicing that can return views into arrays without copying data.
Python can be used for a variety of applications including web development, scientific computing, education, desktop GUIs, and software development. It is commonly used to build web applications using frameworks like Django and Flask, for scientific computing tasks using libraries like NumPy and SciPy, and for general software development tasks like build automation and testing. Python supports a range of data types including integers, floats, complex numbers, lists, dictionaries, sets, and strings. It can be used to write functions and programs to solve problems across many domains.
The document provides information about stacks and their implementation and applications. It discusses representing stacks using static and dynamic arrays. It describes basic stack operations like push, pop, peek, isEmpty and size. It discusses multiple stack implementation using a single array and provides pseudocode. It also discusses applications of stacks like reversing a list, calculating factorials recursively, infix to postfix conversion, evaluating arithmetic expressions and the Towers of Hanoi problem.
Engineering CS 5th Sem Python Module -2.pptxhardii0991
油
This document discusses various Python string manipulation techniques. It covers concatenating and slicing strings, changing string values using indexes, concatenating and replicating lists of strings. It also covers tuples, working with strings, string literals, formatting strings, useful string methods like join(), split(), strip(), and converting between strings and integers.
This document provides an agenda for a training session on AI and data science. The session is divided into two units: data science and data visualization. Key Python libraries that will be covered for data science include NumPy, Pandas, and Matplotlib. NumPy will be used to create and manipulate multi-dimensional arrays. Pandas allows users to work with labeled and relational data. Matplotlib enables data visualization through graphs and plots. The session aims to provide knowledge of core data science libraries and demonstrate data exploration techniques using these packages.
This document provides an overview of data structures and algorithms. It discusses topics like arrays, stacks, queues, sparse matrices, and analysis of algorithms. Key points include:
- Arrays allow storing elements in contiguous memory locations and accessing via indexes. Representations include one-dimensional, two-dimensional, and sparse arrays.
- Stacks follow LIFO while queues follow FIFO using operations like push, pop for stacks and enqueue, dequeue for queues.
- Sparse matrices store only non-zero elements to save space using representations like triplet format and linked lists.
- Algorithm analysis includes asymptotic analysis of time and space complexity using notations like Big O. Performance of common operations on data structures is also
This document provides an introduction and overview to learning R. It covers installing R and RStudio, basic data types and structures like vectors, matrices and data frames. It also discusses importing data, viewing and manipulating data through functions like filtering, binding and transforming. Finally, it discusses creating summary tables from data, joining datasets, and creating visualizations and plots in R using packages like ggplot2. The goal is to learn the basics of working with data in R, performing basic analysis and creating charts.
R is a popular open-source programming language for statistical analysis and visualization. RStudio is an integrated development environment (IDE) that makes using R even easier. This document provides an introduction to R and RStudio, covering how to install them, basic commands and functions, data types like vectors and matrices, importing and manipulating data, and more. Key topics include arithmetic operations, variable assignment, functions, packages, help documentation, and data structures like vectors, matrices, and data frames.
The document provides an overview of the course curriculum for a Python with AI session. It covers Python basics, pandas for working with datasets, REST APIs and GitHub, data visualization, and a final project. It also reviews key Python concepts like conditionals, loops, lists, dictionaries, modules, and the pandas library for reading CSV files and working with dataframes. Exercises include generating random numbers and working with lists, dictionaries, and dataframes.
- Arrays revisited
- Value and Reference Semantics of Elements
- A Way to categorize Collections
- Indexed Collections
-- Lists
-- Basic Features and Examples
-- Size and Capacity
Linear data structures include arrays, strings, stacks, queues, and lists. Arrays store elements contiguously in memory, allowing efficient random access. Linked lists store elements non-contiguously, with each element containing a data field and pointer to the next element. This allows dynamic sizes but less efficient random access. Linear data structures are ordered, with each element having a single predecessor and successor except for the first and last elements.
This document summarizes Week 3 of a Python programming course. It discusses introspection, which allows code to examine and manipulate other code as objects. It covers optional and named function arguments, built-in functions like type and str, and filtering lists with comprehensions. It also explains lambda functions and how and and or work in Python.
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...HendraPurnama31
油
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkualitas publikasi dalam berbagai format cetak dan lingkungan interaktif di berbagai platform.
NumPy is a Python library that provides multi-dimensional array and matrix objects to handle large amounts of numerical data efficiently. It contains a powerful N-dimensional array object called ndarray that facilitates fast operations on large data sets. NumPy arrays can have any number of dimensions and elements of the array can be of any Python data type. NumPy also provides many useful methods for fast mathematical and statistical operations on arrays like summing, averaging, standard deviation, slicing, and matrix multiplication.
This document provides an overview of NumPy, a fundamental Python library for numerical computing and data science. It discusses how NumPy enables fast and expressive array computing in Python, allowing operations on whole arrays to be performed efficiently at low-level speeds approaching that of languages like C. NumPy arrays store data in a single block of memory and use broadcasting rules to perform arithmetic on arrays with incompatible shapes. NumPy also supports multidimensional indexing and slicing that can return views into arrays without copying data.
Python can be used for a variety of applications including web development, scientific computing, education, desktop GUIs, and software development. It is commonly used to build web applications using frameworks like Django and Flask, for scientific computing tasks using libraries like NumPy and SciPy, and for general software development tasks like build automation and testing. Python supports a range of data types including integers, floats, complex numbers, lists, dictionaries, sets, and strings. It can be used to write functions and programs to solve problems across many domains.
The document provides information about stacks and their implementation and applications. It discusses representing stacks using static and dynamic arrays. It describes basic stack operations like push, pop, peek, isEmpty and size. It discusses multiple stack implementation using a single array and provides pseudocode. It also discusses applications of stacks like reversing a list, calculating factorials recursively, infix to postfix conversion, evaluating arithmetic expressions and the Towers of Hanoi problem.
Engineering CS 5th Sem Python Module -2.pptxhardii0991
油
This document discusses various Python string manipulation techniques. It covers concatenating and slicing strings, changing string values using indexes, concatenating and replicating lists of strings. It also covers tuples, working with strings, string literals, formatting strings, useful string methods like join(), split(), strip(), and converting between strings and integers.
This document provides an agenda for a training session on AI and data science. The session is divided into two units: data science and data visualization. Key Python libraries that will be covered for data science include NumPy, Pandas, and Matplotlib. NumPy will be used to create and manipulate multi-dimensional arrays. Pandas allows users to work with labeled and relational data. Matplotlib enables data visualization through graphs and plots. The session aims to provide knowledge of core data science libraries and demonstrate data exploration techniques using these packages.
This document provides an overview of data structures and algorithms. It discusses topics like arrays, stacks, queues, sparse matrices, and analysis of algorithms. Key points include:
- Arrays allow storing elements in contiguous memory locations and accessing via indexes. Representations include one-dimensional, two-dimensional, and sparse arrays.
- Stacks follow LIFO while queues follow FIFO using operations like push, pop for stacks and enqueue, dequeue for queues.
- Sparse matrices store only non-zero elements to save space using representations like triplet format and linked lists.
- Algorithm analysis includes asymptotic analysis of time and space complexity using notations like Big O. Performance of common operations on data structures is also
This document provides an introduction and overview to learning R. It covers installing R and RStudio, basic data types and structures like vectors, matrices and data frames. It also discusses importing data, viewing and manipulating data through functions like filtering, binding and transforming. Finally, it discusses creating summary tables from data, joining datasets, and creating visualizations and plots in R using packages like ggplot2. The goal is to learn the basics of working with data in R, performing basic analysis and creating charts.
R is a popular open-source programming language for statistical analysis and visualization. RStudio is an integrated development environment (IDE) that makes using R even easier. This document provides an introduction to R and RStudio, covering how to install them, basic commands and functions, data types like vectors and matrices, importing and manipulating data, and more. Key topics include arithmetic operations, variable assignment, functions, packages, help documentation, and data structures like vectors, matrices, and data frames.
The document provides an overview of the course curriculum for a Python with AI session. It covers Python basics, pandas for working with datasets, REST APIs and GitHub, data visualization, and a final project. It also reviews key Python concepts like conditionals, loops, lists, dictionaries, modules, and the pandas library for reading CSV files and working with dataframes. Exercises include generating random numbers and working with lists, dictionaries, and dataframes.
- Arrays revisited
- Value and Reference Semantics of Elements
- A Way to categorize Collections
- Indexed Collections
-- Lists
-- Basic Features and Examples
-- Size and Capacity
Linear data structures include arrays, strings, stacks, queues, and lists. Arrays store elements contiguously in memory, allowing efficient random access. Linked lists store elements non-contiguously, with each element containing a data field and pointer to the next element. This allows dynamic sizes but less efficient random access. Linear data structures are ordered, with each element having a single predecessor and successor except for the first and last elements.
This document summarizes Week 3 of a Python programming course. It discusses introspection, which allows code to examine and manipulate other code as objects. It covers optional and named function arguments, built-in functions like type and str, and filtering lists with comprehensions. It also explains lambda functions and how and and or work in Python.
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...HendraPurnama31
油
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkualitas publikasi dalam berbagai format cetak dan lingkungan interaktif di berbagai platform.
NumPy is a Python library that provides multi-dimensional array and matrix objects to handle large amounts of numerical data efficiently. It contains a powerful N-dimensional array object called ndarray that facilitates fast operations on large data sets. NumPy arrays can have any number of dimensions and elements of the array can be of any Python data type. NumPy also provides many useful methods for fast mathematical and statistical operations on arrays like summing, averaging, standard deviation, slicing, and matrix multiplication.
Enhancing SoTL through Generative AI -- Opportunities and Ethical Considerati...Sue Beckingham
油
This presentation explores the role of generative AI (GenAI) in enhancing the Scholarship of Teaching and Learning (SoTL), using Feltens five principles of good practice as a guiding framework. As educators within higher education institutions increasingly integrate GenAI into teaching and research, it is vital to consider how these tools can support scholarly inquiry into student learning, while remaining contextually grounded, methodologically rigorous, collaborative, and appropriately public.
Through practical examples and case-based scenarios, the session demonstrates how generative GenAI can assist in analysing critical reflection of current practice, enhancing teaching approaches and learning materials, supporting SoTL research design, fostering student partnerships, and amplifying the reach of scholarly outputs. Attendees will gain insights into ethical considerations, opportunities, and limitations of GenAI in SoTL, as well as ideas for integrating GenAI tools into their own scholarly teaching practices. The session invites critical reflection and dialogue about the responsible use of GenAI to enhance teaching, learning, and scholarly impact.
Stages of combustion, Ignition lag, Flame propagation, Factors affecting flame
speed, Abnormal combustion, Influence of engine design and operating
variables on detonation, Fuel rating, Octane number, Fuel additives, HUCR,
Requirements of combustion chambers of S.I. Engines and its types.
Different perspectives on dugout canoe heritage of Soomaa.pdfAivar Ruukel
油
Sharing the story of haabjas to 1st-year students of the University of Tartu MA programme "Folkloristics and Applied Heritage Studies" and 1st-year students of the Erasmus Mundus Joint Master programme "Education in Museums & Heritage".
Marketing is Everything in the Beauty Business! 憓 Talent gets you in the ...coreylewis960
油
Marketing is Everything in the Beauty Business! 憓
Talent gets you in the gamebut visibility keeps your chair full.
Todays top stylists arent just skilledtheyre seen.
Thats where MyFi Beauty comes in.
We Help You Get Noticed with Tools That Work:
Social Media Scheduling & Strategy
We make it easy for you to stay consistent and on-brand across Instagram, Facebook, TikTok, and more.
Youll get content prompts, captions, and posting tools that do the work while you do the hair.
ワ Your Own Personal Beauty App
Stand out from the crowd with a custom app made just for you. Clients can:
Book appointments
Browse your services
View your gallery
Join your email/text list
Leave reviews & refer friends
種 Offline Marketing Made Easy
We provide digital flyers, QR codes, and branded business cards that connect straight to your appturning strangers into loyal clients with just one tap.
ッ The Result?
You build a strong personal brand that reaches more people, books more clients, and grows with you. Whether youre just starting out or trying to level upMyFi Beauty is your silent partner in success.
General College Quiz conducted by Pragya the Official Quiz Club of the University of Engineering and Management Kolkata in collaboration with Ecstasia the official cultural fest of the University of Engineering and Management Kolkata.
Relive the excitement of the Sports Quiz conducted as part of the prestigious Quizzitch Cup 2025 at NIT Durgapur! Organized by QuizINC, the official quizzing club, this quiz challenged students with some of the most thrilling and thought-provoking sports trivia.
Whats Inside?
A diverse mix of questions across multiple sports Cricket, Football, Olympics, Formula 1, Tennis, and more!
Challenging and unique trivia from historic moments to recent sporting events
Engaging visuals and fact-based questions to test your sports knowledge
Designed for sports enthusiasts, quiz lovers, and competitive minds
Students, sports fans, and quizzers looking for an exciting challenge
College quizzing clubs and organizers seeking inspiration for their own sports quizzes
Trivia buffs and general knowledge enthusiasts who love sports-related facts
Quizzing is more than just answering questionsits about learning, strategizing, and competing. This quiz was crafted to challenge even the sharpest minds and celebrate the world of sports with intellect and passion!
Design approaches and ethical challenges in Artificial Intelligence tools for...Yannis
油
The recent technology of Generative Artificial Intelligence (GenAI) has undeniable advantages, especially with regard to improving the efficiency of all stakeholders in the education process.
At the same time, almost all responsible international organisations and experts in the field of education and educational technology point out a multitude of general ethical problems that need to be addressed. Many of these problems have already arisen in previous models of artificial intelligence or even in systems based on learning data, and several are appearing for the first time.
In this short contribution, we will briefly review some dimensions of ethical problems, both (a) the general ones related to trust, transparency, privacy, personal data security, accountability, environmental responsibility, bias, power imbalance, etc., and (b) the more directly related to teaching, learning, and education, such as students' critical thinking, the social role of education, the development of teachers' professional competences, etc.
In addition, the categorizations of possible service allocation to humans and AI tools, the human-centered approach to designing AI tools and learning data, as well as the more general design of ethics-aware applications and activities will be briefly presented. Finally, some short illustrative examples will be presented to set the basis for the debate in relation to ethical and other dilemmas.
General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...Amlan Sarkar
油
Prelims (with answers) + Finals of a general quiz originally conducted on 13th November, 2024.
Part of The Maharaja Quiz - the Annual Quiz Fest of Maharaja Agrasen College, University of Delhi.
Feedback welcome at amlansarkr@gmail.com
Knownsense is the General Quiz conducted by Pragya the Official Quiz Club of the University of Engineering and Management Kolkata in collaboration with Ecstasia the official cultural fest of the University of Engineering and Management Kolkata
Unit1 Inroduction to Internal Combustion EnginesNileshKumbhar21
油
Introduction of I. C. Engines, Types of engine, working of engine, Nomenclature of engine, Otto cycle, Diesel cycle Fuel air cycles Characteristics of fuel - air mixtures Actual cycles, Valve timing diagram for high and low speed engine, Port timing diagram
A Systematic Review:
Provides a clear and transparent process
Facilitates efficient integration of information for rational decision
making
Demonstrates where the effects of health care are consistent and
where they do vary
Minimizes bias (systematic errors) and reduce chance effects
Can be readily updated, as needed.
Meta-analysis can provide more precise estimates than individual
studies
Allows decisions based on evidence , whole of it and not partial
2. Numerical Python (NumPy)
NumPy is the most foundational package for numerical computing in Python.
If you are going to work on data analysis or machine learning projects, then
having a solid understanding of NumPy is nearly mandatory.
Indeed, many other libraries, such as pandas and scikit-learn, use NumPys array
objects as the lingua franca for data exchange.
One of the reasons as to why NumPy is so important for numerical computations
is because it is designed for efficiency with large arrays of data. The reasons for
this include:
- It stores data internally in a continuous block of memory, independent
of other in-built Python objects.
- It performs complex computations on entire arrays without the need
for for loops.
3. What youll find in NumPy
ndarray: an efficient multidimensional array providing fast array-orientated
arithmetic operations and flexible broadcasting capabilities.
Mathematical functions for fast operations on entire arrays of data without
having to write loops.
Tools for reading/writing array data to disk and working with memory-
mapped files.
Linear algebra, random number generation, and Fourier transform
capabilities.
A C API for connecting NumPy with libraries written in C, C++, and FORTRAN.
This is why Python is the language of choice for wrapping legacy codebases.
4. The NumPy ndarray: A multi-dimensional
array object
The NumPy ndarray object is a fast and flexible container for large
data sets in Python.
NumPy arrays are a bit like Python lists, but are still a very different
beast at the same time.
Arrays enable you to store multiple items of the same data type. It is
the facilities around the array object that makes NumPy so convenient
for performing math and data manipulations.
5. Ndarray vs. lists
By now, you are familiar with Python lists and how incredibly useful
they are.
So, you may be asking yourself:
I can store numbers and other objects in a Python list and do all sorts
of computations and manipulations through list comprehensions, for-
loops etc. What do I need a NumPy array for?
There are very significant advantages of using NumPy arrays overs
lists.
6. Creating a NumPy array
To understand these advantages, lets create an array.
One of the most common, of the many, ways to create a NumPy array
is to create one from a list by passing it to the np.array() function.
In: Out:
7. Differences between lists and ndarrays
The key difference between an array and a list is that arrays are
designed to handle vectorised operations while a python lists are not.
That means, if you apply a function, it is performed on every item in
the array, rather than on the whole array object.
8. Lets suppose you want to add the number 2 to every item in the list.
The intuitive way to do this is something like this:
That was not possible with a list, but you can do that on an array:
In: Out:
In: Out:
9. It should be noted here that, once a Numpy array is created, you
cannot increase its size.
To do so, you will have to create a new array.
10. Create a 2d array from a list of list
You can pass a list of lists to create a matrix-like a 2d array.
In:
Out:
11. The dtype argument
You can specify the data-type by setting the dtype() argument.
Some of the most commonly used NumPy dtypes are: float, int, bool, str,
and object.
In:
Out:
12. The astype argument
You can also convert it to a different data-type using the astype method.
In: Out:
Remember that, unlike lists, all items in an array have to be of the same
type.
13. dtype=object
However, if you are uncertain about what data type your array will
hold, or if you want to hold characters and numbers in the same array,
you can set the dtype as 'object'.
In: Out:
14. The tolist() function
You can always convert an array into a list using the tolist() command.
In: Out:
15. Inspecting a NumPy array
There are a range of functions built into NumPy that allow you to
inspect different aspects of an array:
In:
Out:
16. Extracting specific items from an array
You can extract portions of the array using indices, much like when
youre working with lists.
Unlike lists, however, arrays can optionally accept as many parameters
in the square brackets as there are number of dimensions
In: Out:
17. Boolean indexing
A boolean index array is of the same shape as the array-to-be-filtered,
but it only contains TRUE and FALSE values.
In: Out:
18. Pandas
Pandas, like NumPy, is one of the most popular Python libraries for
data analysis.
It is a high-level abstraction over low-level NumPy, which is written in
pure C.
Pandas provides high-performance, easy-to-use data structures and
data analysis tools.
There are two main structures used by pandas; data frames and
series.
19. Indices in a pandas series
A pandas series is similar to a list, but differs in the fact that a series associates a label with
each element. This makes it look like a dictionary.
If an index is not explicitly provided by the user, pandas creates a RangeIndex ranging from 0
to N-1.
Each series object also has a data type.
In: Out
:
20. As you may suspect by this point, a series has ways to extract all of
the values in the series, as well as individual elements by index.
In: Out
:
You can also provide an index manually.
In:
Out:
21. It is easy to retrieve several elements of a series by their indices or
make group assignments.
In:
Out:
22. Filtering and maths operations
Filtering and maths operations are easy with Pandas as well.
In: Out
:
23. Pandas data frame
Simplistically, a data frame is a table, with rows and columns.
Each column in a data frame is a series object.
Rows consist of elements inside series.
Case ID Variable one Variable two Variable 3
1 123 ABC 10
2 456 DEF 20
3 789 XYZ 30
24. Creating a Pandas data frame
Pandas data frames can be constructed using Python dictionaries.
In:
Out:
25. You can also create a data frame from a list.
In: Out:
26. You can ascertain the type of a column with the type() function.
In:
Out:
27. A Pandas data frame object as two indices; a column index and row
index.
Again, if you do not provide one, Pandas will create a RangeIndex from 0
to N-1.
In:
Out:
28. There are numerous ways to provide row indices explicitly.
For example, you could provide an index when creating a data frame:
In: Out:
or do it during runtime.
Here, I also named the index country code.
In:
Out:
29. Row access using index can be performed in several ways.
First, you could use .loc() and provide an index label.
Second, you could use .iloc() and provide an index number
In: Out:
In: Out:
30. A selection of particular rows and columns can be selected this way.
In: Out:
You can feed .loc() two arguments, index list and column list, slicing operation
is supported as well:
In: Out:
32. Deleting columns
You can delete a column using the drop() function.
In: Out:
In: Out:
33. Reading from and writing to a file
Pandas supports many popular file formats including CSV, XML, HTML,
Excel, SQL, JSON, etc.
Out of all of these, CSV is the file format that you will work with the
most.
You can read in the data from a CSV file using the read_csv() function.
Similarly, you can write a data frame to a csv file with the to_csv()
function.
34. Pandas has the capacity to do much more than what we have covered
here, such as grouping data and even data visualisation.
However, as with NumPy, we dont have enough time to cover every
aspect of pandas here.
35. Exploratory data analysis (EDA)
Exploring your data is a crucial step in data analysis. It involves:
Organising the data set
Plotting aspects of the data set
Maybe producing some numerical summaries; central tendency and
spread, etc.
Exploratory data analysis can never be the whole story, but nothing
else can serve as the foundation stone.
- John Tukey.
36. Download the data
Download the Pokemon dataset from:
https://github.com/LewBrace/da_and_vis_python
Unzip the folder, and save the data file in a location youll remember.
37. Reading in the data
First we import the Python packages we are going to use.
Then we use Pandas to load in the dataset as a data frame.
NOTE: The argument index_col argument states that we'll treat the first column
of the dataset as the ID column.
NOTE: The encoding argument allows us to by pass an input error created
by special characters in the data set.
39. We could spend time staring at these
numbers, but that is unlikely to offer
us any form of insight.
We could begin by conducting all of
our statistical tests.
However, a good field commander
never goes into battle without first
doing a recognisance of the terrain
This is exactly what EDA is for
41. Bins
You may have noticed the two histograms weve seen so far look different,
despite using the exact same data.
This is because they have different bin values.
The left graph used the default bins generated by plt.hist(), while the one on the
right used bins that I specified.
42. There are a couple of ways to manipulate bins in matplotlib.
Here, I specified where the edges of the bars of the histogram are; the
bin edges.
43. You could also specify the number of bins, and Matplotlib will automatically
generate a number of evenly spaced bins.
44. Seaborn
Matplotlib is a powerful, but sometimes unwieldy, Python library.
Seaborn provides a high-level interface to Matplotlib and makes it easier
to produce graphs like the one on the right.
Some IDEs incorporate elements of this under the hood nowadays.
45. Benefits of Seaborn
Seaborn offers:
- Using default themes that are aesthetically pleasing.
- Setting custom colour palettes.
- Making attractive statistical plots.
- Easily and flexibly displaying distributions.
- Visualising information from matrices and DataFrames.
The last three points have led to Seaborn becoming the exploratory
data analysis tool of choice for many Python users.
46. Plotting with Seaborn
One of Seaborn's greatest strengths is its diversity of plotting
functions.
Most plots can be created with one line of code.
For example.
48. Other types of graphs: Creating a scatter plot
Seaborn linear
model plot
function for
creating a scatter
graph
Name of variable we
want on the y-axis
Name of variable we
want on the x-axis
Name of our
dataframe fed to the
data= command
49. Seaborn doesn't have a dedicated scatter plot function.
We used Seaborn's function for fitting and plotting a regression line;
hence lmplot()
However, Seaborn makes it easy to alter plots.
To remove the regression line, we use the fit_reg=False command
50. The hue function
Another useful function in Seaborn is the hue function, which enables
us to use a variable to colour code our data points.
51. Factor plots
Make it easy to separate plots by categorical classes.
Colour by stage.
Separate by stage.
Generate using a swarmplot.
Rotate axis on x-ticks by 45 degrees.
54. The total, stage, and legendary entries are not combat stats so we should remove them.
Pandas makes this easy to do, we just create a new dataframe
We just use Pandas .drop() function to create a dataframe that doesnt include the
variables we dont want.
55. Seaborns theme
Seaborn has a number of themes you can use to alter the appearance
of plots.
For example, we can use whitegrid to add grid lines to our boxplot.
56. Violin plots
Violin plots are useful alternatives to box plots.
They show the distribution of a variable through the thickness of the violin.
Here, we visualise the distribution of attack by Pok辿mon's primary type:
57. Dragon types tend to have higher Attack stats than Ghost types, but they also have greater
variance. But there is something not right here.
The colours!
58. Seaborns colour palettes
Seaborn allows us to easily set custom colour palettes by providing it
with an ordered list of colour hex values.
We first create our colours list.
59. Then we just use the palette= function and feed in our colours list.
60. Because of the limited number of observations, we could also use a
swarm plot.
Here, each data point is an observation, but data points are grouped
together by the variable listed on the x-axis.
61. Overlapping plots
Both of these show similar information, so it might be useful to
overlap them.
Set size of print canvas.
Remove bars from inside the violins
Make bars black and slightly transparent
Give the graph a title
63. Data wrangling with Pandas
What if we wanted to create such a plot that included all of the other
stats as well?
In our current dataframe, all of the variables are in different columns:
64. If we want to visualise all stats, then well have to melt the
dataframe.
We use the .drop() function again to re-
create the dataframe without these three
variables.
The dataframe we want to melt.
The variables to keep, all others will be
melted.
A name for the new, melted, variable.
All 6 of the stat columns have been "melted" into one, and
the new Stat column indicates the original stat (HP, Attack,
Defense, Sp. Attack, Sp. Defense, or Speed).
It's hard to see here, but each pokemon now has 6 rows of
data; hende the melted_df has 6 times more rows of data.
66. This graph could be made to look nicer with a few tweaks.
Enlarge the plot.
Separate points by hue.
Use our special Pokemon colour palette.
Adjust the y-axis.
Move the legend box outside of
the graph and place to the right of
it..
68. Plotting all data: Empirical cumulative
distribution functions (ECDFs)
An alternative way of visualising a
distribution of a variable in a large dataset
is to use an ECDF.
Here we have an ECDF that shows the
percentages of different attack strengths of
pokemon.
An x-value of an ECDF is the quantity you
are measuring; i.e. attacks strength.
The y-value is the fraction of data points
that have a value smaller than the
corresponding x-value. For example
69. 20% of Pokemon have an attack
level of 50 or less.
75% of Pokemon have an attack
level of 90 or less
71. You can also plot multiple ECDFs
on the same plot.
As an example, here with have an
ECDF for Pokemon attack, speed,
and defence levels.
We can see here that defence
levels tend to be a little less than
the other two.
72. The usefulness of ECDFs
It is often quite useful to plot the ECDF first as part of your workflow.
It shows all the data and gives a complete picture as to how the data
are distributed.
73. Heatmaps
Useful for visualising matrix-like data.
Here, well plot the correlation of the stats_df variables
74. Bar plot
Visualises the distributions of categorical variables.
Rotates the x-ticks 45 degrees
75. Joint Distribution Plot
Joint distribution plots combine information from scatter plots and
histograms to give you detailed information for bi-variate distributions.