Elements of Inference covers the following concepts and takes off right from where we left off in the previous slide /GiridharChandrasekar1/statistics1-the-basics-of-statistics.
Population Vs Sample (Measures)
Probability
Random Variables
Probability Distributions
Statistical Inference The Concept
This document provides an introduction to inferential statistics. It defines key terms like probability, random variables, and probability distributions such as the normal distribution. It discusses how inferential statistics can be used to make generalizations about populations based on samples. Hypothesis testing is introduced as a core technique in inferential statistics for testing proposed relationships. Concepts discussed in more depth include the normal distribution, parameters like the mean and standard deviation, sampling error, confidence intervals, and significance levels.
This document outlines the syllabus for a course titled "Predictive Analytics" taught by K. Mohanasundaram. The syllabus covers topics such as introduction to business analytics, mathematical modelling, data prediction techniques, regression analysis methods like simple linear regression, logistic regression, and forecasting techniques. It recommends textbooks and references for the course and provides an introduction to concepts like uncertainty modelling using probability distributions and random variables.
1) The document discusses different statistical tests including z-tests, t-tests, and F-tests/ANOVAs that are used to make inferences about populations based on experimental sample data. It provides examples of how each test is used including formulas and worked examples.
2) Statistical tests in SPM, like t-tests and F-tests, are used to test specific hypotheses about differences between conditions in a brain imaging experiment by creating linear combinations of regressors in the design matrix called contrasts.
3) The document emphasizes that statistical tests allow researchers to test the probability that sample data come from a hypothesized population distribution in order to make inferences about whether experimental manipulations have significant effects.
This document discusses several common probability distributions: binomial, Poisson, geometric, and normal. It provides characteristics, formulas, and examples of each. The binomial distribution describes independent yes/no trials with fixed probabilities. The Poisson distribution applies when the probability of an event is very small. The geometric distribution gives the number of trials until the first success. The normal distribution is symmetric and bell-shaped, describing many natural phenomena.
This document provides an overview of descriptive and inferential statistics concepts. It discusses parameters versus statistics, descriptive versus inferential statistics, measures of central tendency (mean, median, mode), variability (standard deviation, range), distributions (normal, positively/negatively skewed), z-scores, correlations, hypothesis testing, t-tests, ANOVA, chi-square tests, and presenting results. Key terms like alpha levels, degrees of freedom, effect sizes, and probabilities are also introduced at a high level.
This document provides information about discrete and continuous probability distributions. It defines discrete and continuous random variables and gives examples of each. It describes how to calculate the mean and variance of discrete distributions. It also introduces the binomial, Poisson, and normal distributions and provides the key properties and formulas to describe and calculate probabilities for each distribution.
Probability introduction for non-math peopleGuangYang92
油
Probability distributions describe the likelihood of different outcomes and how that likelihood may change based on various factors. Understanding basic probability concepts such as events, outcomes, and how to calculate probabilities is important for interpreting machine learning results, even without advanced math knowledge. Common probability distributions include the binomial, normal, and exponential distributions. The appropriate distribution depends on factors like whether outcomes are continuous or discrete, and whether trials are independent or related.
This document discusses the normal distribution and its key properties. It also discusses sampling distributions and the central limit theorem. Some key points:
- The normal distribution is bell-shaped and symmetric. It is defined by its mean and standard deviation. Approximately 68% of values fall within 1 standard deviation of the mean.
- Sample statistics like the sample mean follow sampling distributions. When samples are large and random, the sampling distributions are often normally distributed according to the central limit theorem.
- Correlation and regression analyze the relationship between two variables. Correlation measures the strength and direction of association, while regression finds the best-fitting linear relationship to predict one variable from the other.
This document provides an introduction to statistical modeling and machine learning concepts. It discusses:
- What statistical modeling and machine learning are, including training models on data and evaluating them.
- Common statistical models like Gaussian, Bernoulli, and Multinomial distributions.
- Supervised learning tasks like regression and classification, and unsupervised clustering.
- Key concepts like overfitting, evaluation metrics, and issues with modeling like black swans and the long tail.
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
油
Simplest Machine Learning algorithm or one of the most fundamental Statistical Learning technique is Linear Regression. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
This document discusses statistical analysis and data science concepts. It covers descriptive statistics like mean, median, mode, and standard deviation. It also discusses inferential statistics including hypothesis testing, confidence intervals, and linear regression. Additionally, it discusses probability distributions, random variables, and the normal distribution. Key concepts are defined and examples are provided to illustrate statistical measures and probability calculations.
This document discusses various statistical concepts used in research. It defines the coefficient of variation as a measure of relative variability that describes the amount of variability relative to the mean. It also discusses arithmetic average, different methods to measure skewness in a data distribution such as Pearson's coefficient and Bowley's coefficient, and the regression equation that represents the relationship between an independent and dependent variable in simple regression analysis. The document provides examples and limitations of these statistical concepts.
This document provides an outline and summaries of topics related to error analysis:
- It outlines topics including binomial distribution, Poisson distribution, normal distribution, confidence interval, and least squares analysis.
- The binomial distribution section provides an example of calculating the probability of getting 2 and 3 heads out of 6 coin tosses.
- The normal distribution section explains how to calculate the probability of scoring between 90-110 on an IQ test with a mean of 100 and standard deviation of 10.
- The confidence interval section provides an example of calculating the 95% confidence interval for the population mean boiling temperature based on 6 sample measurements.
Probability theory provides a framework for quantifying and manipulating uncertainty. It allows optimal predictions given incomplete information. The document outlines key probability concepts like sample spaces, events, axioms of probability, joint/conditional probabilities, and Bayes' rule. It also covers important probability distributions like binomial, Gaussian, and multivariate Gaussian. Finally, it discusses optimization concepts for machine learning like functions, derivatives, and using derivatives to find optima like maxima and minima.
This document provides an overview of statistical tests of significance used to analyze data and determine whether observed differences could reasonably be due to chance. It defines key terms like population, sample, parameters, statistics, and hypotheses. It then describes several common tests including z-tests, t-tests, F-tests, chi-square tests, and ANOVA. For each test, it outlines the assumptions, calculation steps, and how to interpret the results to evaluate the null hypothesis. The goal of these tests is to determine if an observed difference is statistically significant or could reasonably be expected due to random chance alone.
This document discusses different types of frequency distributions including theoretical, empirical, binomial, Poisson, and normal distributions. It provides details on the key characteristics of each distribution such as their assumptions, formulas, and appropriate uses. The normal distribution is described as the most useful theoretical distribution for continuous variables and an approximation of the binomial distribution for large sample sizes. Properties of the normal distribution include being bell-shaped and symmetrical with the mean, median, and mode all equal.
This document provides an overview of probability concepts and distributions. It discusses how probability originated from games of chance and has become a basic statistical tool. The key concepts covered include random experiments, sample spaces, events, types of probability, probability theorems, permutations, combinations, random variables, probability distributions, and important theoretical distributions such as the binomial, Poisson, and normal distributions. Characteristics and properties of these distributions are also outlined.
This document provides an overview of the chi-square test and student's t-test. It defines key terms like parametric vs. non-parametric tests and explains the assumptions and applications of each test. For the chi-square test, it outlines the steps to calculate chi-square values and determine whether to accept or reject the null hypothesis based on comparing the calculated and tabular values. For the t-test, it describes the assumptions and types of t-tests, and notes some common uses like comparing group means and testing regression coefficients. Examples are provided to demonstrate calculating chi-square values from observed and expected data.
This document provides a summary of a 4-part training program on using PASW Statistics 17 (SPSS 17) software to perform descriptive statistics, tests of significance, regression analysis, and chi-square/ANOVA. The agenda covers topics like frequency analysis, correlations, t-tests, ANOVA, importing/exporting data, and more. The goal is to help users answer research questions and test hypotheses using techniques in PASW Statistics.
Introduction to Statistics53004300.pptTripthiDubey
油
This document provides an introduction to descriptive statistics and measures of central tendency. It discusses the difference between descriptive statistics of a population versus inferential statistics of samples. It then describes three common measures of central tendency: the mean, median, and mode. It explains how to calculate each measure and the advantages and disadvantages of each. The document concludes by discussing different types of graphs that can be used to organize and present descriptive statistics, including histograms, pie charts, line graphs, and scatter plots.
The document discusses the use of statistics in analytical chemistry. It provides definitions and explanations of key statistical concepts used to analyze chemical data, including:
- Mean, median, standard deviation, and variance as measures of central tendency and spread of data.
- The normal distribution and how it relates to accuracy and precision.
- Confidence intervals and how they are used to estimate the uncertainty around a measured value based on the standard deviation.
- How the size of data sets affects the confidence interval and uncertainty of results.
The document discusses probability distributions and their applications in engineering. It defines probability distributions as mathematical functions that describe the likelihood of different outcomes in random events. There are two main types: discrete distributions which model events with a finite number of outcomes, and continuous distributions which model events with an infinite number of possible outcomes. The normal distribution, which follows a bell curve, is commonly used as it models many real-world phenomena. The document provides examples of using Python to plot a normal distribution and calculate probabilities based on the normal curve.
This document discusses the normal distribution and its key properties. It also discusses sampling distributions and the central limit theorem. Some key points:
- The normal distribution is bell-shaped and symmetric. It is defined by its mean and standard deviation. Approximately 68% of values fall within 1 standard deviation of the mean.
- Sample statistics like the sample mean follow sampling distributions. When samples are large and random, the sampling distributions are often normally distributed according to the central limit theorem.
- Correlation and regression analyze the relationship between two variables. Correlation measures the strength and direction of association, while regression finds the best-fitting linear relationship to predict one variable from the other.
This document provides an introduction to statistical modeling and machine learning concepts. It discusses:
- What statistical modeling and machine learning are, including training models on data and evaluating them.
- Common statistical models like Gaussian, Bernoulli, and Multinomial distributions.
- Supervised learning tasks like regression and classification, and unsupervised clustering.
- Key concepts like overfitting, evaluation metrics, and issues with modeling like black swans and the long tail.
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
油
Simplest Machine Learning algorithm or one of the most fundamental Statistical Learning technique is Linear Regression. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
This document discusses statistical analysis and data science concepts. It covers descriptive statistics like mean, median, mode, and standard deviation. It also discusses inferential statistics including hypothesis testing, confidence intervals, and linear regression. Additionally, it discusses probability distributions, random variables, and the normal distribution. Key concepts are defined and examples are provided to illustrate statistical measures and probability calculations.
This document discusses various statistical concepts used in research. It defines the coefficient of variation as a measure of relative variability that describes the amount of variability relative to the mean. It also discusses arithmetic average, different methods to measure skewness in a data distribution such as Pearson's coefficient and Bowley's coefficient, and the regression equation that represents the relationship between an independent and dependent variable in simple regression analysis. The document provides examples and limitations of these statistical concepts.
This document provides an outline and summaries of topics related to error analysis:
- It outlines topics including binomial distribution, Poisson distribution, normal distribution, confidence interval, and least squares analysis.
- The binomial distribution section provides an example of calculating the probability of getting 2 and 3 heads out of 6 coin tosses.
- The normal distribution section explains how to calculate the probability of scoring between 90-110 on an IQ test with a mean of 100 and standard deviation of 10.
- The confidence interval section provides an example of calculating the 95% confidence interval for the population mean boiling temperature based on 6 sample measurements.
Probability theory provides a framework for quantifying and manipulating uncertainty. It allows optimal predictions given incomplete information. The document outlines key probability concepts like sample spaces, events, axioms of probability, joint/conditional probabilities, and Bayes' rule. It also covers important probability distributions like binomial, Gaussian, and multivariate Gaussian. Finally, it discusses optimization concepts for machine learning like functions, derivatives, and using derivatives to find optima like maxima and minima.
This document provides an overview of statistical tests of significance used to analyze data and determine whether observed differences could reasonably be due to chance. It defines key terms like population, sample, parameters, statistics, and hypotheses. It then describes several common tests including z-tests, t-tests, F-tests, chi-square tests, and ANOVA. For each test, it outlines the assumptions, calculation steps, and how to interpret the results to evaluate the null hypothesis. The goal of these tests is to determine if an observed difference is statistically significant or could reasonably be expected due to random chance alone.
This document discusses different types of frequency distributions including theoretical, empirical, binomial, Poisson, and normal distributions. It provides details on the key characteristics of each distribution such as their assumptions, formulas, and appropriate uses. The normal distribution is described as the most useful theoretical distribution for continuous variables and an approximation of the binomial distribution for large sample sizes. Properties of the normal distribution include being bell-shaped and symmetrical with the mean, median, and mode all equal.
This document provides an overview of probability concepts and distributions. It discusses how probability originated from games of chance and has become a basic statistical tool. The key concepts covered include random experiments, sample spaces, events, types of probability, probability theorems, permutations, combinations, random variables, probability distributions, and important theoretical distributions such as the binomial, Poisson, and normal distributions. Characteristics and properties of these distributions are also outlined.
This document provides an overview of the chi-square test and student's t-test. It defines key terms like parametric vs. non-parametric tests and explains the assumptions and applications of each test. For the chi-square test, it outlines the steps to calculate chi-square values and determine whether to accept or reject the null hypothesis based on comparing the calculated and tabular values. For the t-test, it describes the assumptions and types of t-tests, and notes some common uses like comparing group means and testing regression coefficients. Examples are provided to demonstrate calculating chi-square values from observed and expected data.
This document provides a summary of a 4-part training program on using PASW Statistics 17 (SPSS 17) software to perform descriptive statistics, tests of significance, regression analysis, and chi-square/ANOVA. The agenda covers topics like frequency analysis, correlations, t-tests, ANOVA, importing/exporting data, and more. The goal is to help users answer research questions and test hypotheses using techniques in PASW Statistics.
Introduction to Statistics53004300.pptTripthiDubey
油
This document provides an introduction to descriptive statistics and measures of central tendency. It discusses the difference between descriptive statistics of a population versus inferential statistics of samples. It then describes three common measures of central tendency: the mean, median, and mode. It explains how to calculate each measure and the advantages and disadvantages of each. The document concludes by discussing different types of graphs that can be used to organize and present descriptive statistics, including histograms, pie charts, line graphs, and scatter plots.
The document discusses the use of statistics in analytical chemistry. It provides definitions and explanations of key statistical concepts used to analyze chemical data, including:
- Mean, median, standard deviation, and variance as measures of central tendency and spread of data.
- The normal distribution and how it relates to accuracy and precision.
- Confidence intervals and how they are used to estimate the uncertainty around a measured value based on the standard deviation.
- How the size of data sets affects the confidence interval and uncertainty of results.
The document discusses probability distributions and their applications in engineering. It defines probability distributions as mathematical functions that describe the likelihood of different outcomes in random events. There are two main types: discrete distributions which model events with a finite number of outcomes, and continuous distributions which model events with an infinite number of possible outcomes. The normal distribution, which follows a bell curve, is commonly used as it models many real-world phenomena. The document provides examples of using Python to plot a normal distribution and calculate probabilities based on the normal curve.
Computer Networks 04 Data and Signal Fundamentals.pptxRBeze58
油
This document discusses key concepts related to data and signal fundamentals, including:
- Data can be analog or digital, with analog being continuous and digital having discrete states
- Signals can also be analog or digital, with analog having infinite possible values and digital having a limited set
- Common signals used in data communication are periodic analog and non-periodic digital signals
- Frequency and period are inverses, with the example given of a 100 ms period signal having a frequency of 10 kHz
Explainability and Transparency in Artificial Intelligence: Ethical Imperativ...AI Publications
油
Artificial Intelligence (AI) is increasingly embedded in high-stakes domains such as healthcare, finance, and law enforcement, where opaque decision-making raises significant ethical concerns. Among the core challenges in AI ethics are explainability and transparencykey to fostering trust, accountability, and fairness in algorithmic systems. This review explores the ethical foundations of explainable AI (XAI), surveys leading technical approaches such as model-agnostic interpretability techniques and post-hoc explanation methods and examines their inherent limitations and trade-offs. A real-world case study from the healthcare sector highlights the critical consequences of deploying non-transparent AI models in clinical decision-making. The article also discusses emerging regulatory frameworks and underscores the need for interdisciplinary collaboration to address the evolving ethical landscape. The review concludes with recommendations for aligning technical innovation with ethical imperatives through responsible design and governance.
A measles outbreak originating in West Texas has been linked to confirmed cases in New Mexico, with additional cases reported in Oklahoma and Kansas. 58 individuals have required hospitalization, and 3 deaths, 2 children in Texas and 1 adult in New Mexico. These fatalities mark the first measles-related deaths in the United States since 2015 and the first pediatric measles death since 2003. The YSPH The Virtual Medical Operations Center Briefs (VMOC) were created as a service-learning project by faculty and graduate students at the Yale School of Public Health in response to the 2010 Haiti Earthquake. Each year, the VMOC Briefs are produced by students enrolled in Environmental Health Science Course 581 - Public Health Emergencies: Disaster Planning and Response. These briefs compile diverse information sources including status reports, maps, news articles, and web content into a single, easily digestible document that can be widely shared and used interactively. Key features of this report include:
- Comprehensive Overview: Provides situation updates, maps, relevant news, and web resources.
- Accessibility: Designed for easy reading, wide distribution, and interactive use.
- Collaboration: The unlocked" format enables other responders to share, copy, and adapt it seamlessly.
The students learn by doing, quickly discovering how and where to find critical information and presenting it in an easily understood manner.
Mix Design of M40 Concrete & Application of NDT.pptxnarayan311979
油
This presentation briefs the audience about how to carry out design mix of M40 concrete, what are the cares one need to take while doing trials. Second part of presentation deals with various NDT test and its applications in evaluating quality of concrete of existing structures.
Distributed System Models and Enabling Technologies: Scalable Computing Over the
Internet, Technologies for Network Based Systems, System Models for Distributed and
Cloud Computing, Software Environments for Distributed Systems and Clouds,
Performance, Security and Energy Efficiency.
Barbara Bianco
Project Manager and Project Architect, with extensive experience in managing and developing complex projects from concept to completion. Since September 2023, she has been working as a Project Manager at MAB Arquitectura, overseeing all project phases, from concept design to construction, with a strong focus on artistic direction and interdisciplinary coordination.
Previously, she worked at Progetto CMR for eight years (2015-2023), taking on roles of increasing responsibility: initially as a Project Architect, and later as Head of Research & Development and Competition Area (2020-2023).
She graduated in Architecture from the University of Genoa and obtained a Level II Masters in Digital Architecture and Integrated Design from the INArch Institute in Rome, earning the MAD Award. In 2009, she won First Prize at Urban Promo Giovani with the project "From Urbanity to Humanity", a redevelopment plan for the Maddalena district of Genoa focused on the visual and perceptive rediscovery of the city.
Experience & Projects
Barbara has developed projects for major clients across various sectors (banking, insurance, real estate, corporate), overseeing both the technical and aesthetic aspects while coordinating multidisciplinary teams. Notable projects include:
The Sign Business District for Covivio, Milan
New L'Or辿al Headquarters in Milan, Romolo area
Redevelopment of Via C. Colombo in Rome for Prelios, now the PWC headquarters
Interior design for Spark One & Spark Two, two office buildings in the Santa Giulia district, Milan (Spark One: 53,000 m族) for In.Town-Lendlease
She has also worked on international projects such as:
International Specialized Hospital of Uganda (ISHU) Kampala
Palazzo Milano, a residential building in Taiwan for Chonghong Construction
Chua Lang Street Building, a hotel in Hanoi
Manjiangwan Masterplan, a resort in China
Key Skills
鏝 Integrated design: managing and developing projects from concept to completion
鏝 Artistic direction: ensuring aesthetic quality and design consistency
鏝 Project management: coordinating clients, designers, and multidisciplinary consultants
鏝 Software proficiency: AutoCAD, Photoshop, InDesign, Office Suite
鏝 Languages: Advanced English, Basic French
鏝 Leadership & problem-solving: ability to lead teams and manage complex processes in dynamic environments
SIMULATION OF FIR FILTER BASED ON CORDIC ALGORITHMVLSICS Design
油
Coordinate Rotation Digital Computer (CORDIC) discovered by Jack E Volder. It is a shift-add operation and iterative algorithm. CORDIC algorithm has wide area for several applications like digital signal processing, biomedical processing, image processing, radar signal processing, 8087 math coprocessor, the HP-35 calculator, Discrete Fourier, Discrete Hartley and Chirp-Z transforms, filtering, robotics, real time navigational system and also in communication systems. In this paper, we discussed about the CORDIC algorithm and CORDIC algorithm based finite impulse response low pass & high pass filter. We have generated the M-code for the CORDIC Algorithm and CORDIC Algorithm based FIR filter with the help of MATLAB 2010a.We also discussed about the frequency response characteristics of FIR filter.
Reinventando el CD_ Unificando Aplicaciones e Infraestructura con Crossplane-...Alberto Lorenzo
油
En esta charla, exploraremos c坦mo Crossplane puede transformar la forma en que gestionamos despliegues, yendo m叩s all叩 de un simple IaC para convertirse en una potente herramienta de Continuous Deployment (CD).
Aprenderemos que es Crossplane
Como usar Crossplane como IaC pero sobretodo Deployment as Code de manera eficiente
Unificar la gesti坦n de aplicaciones e infraestructura de forma nativa en Kubernetes
irst-order differential equations find applications in modeling various phenomena, including growth and decay processes, Newton's law of cooling, electrical circuits, falling body problems, and mixing problems.
PROJECT REPORT ON PASTA MACHINE - KP AUTOMATIONS - PASTA MAKING MACHINE PROJE...yadavchandan322
油
All the materials and content contained in Project report is for educational purpose and reflect the views of the industry which are drawn from various research on pasta machine. PM FME- Detailed Project Report of Multigrain Pasta Making Unit. 3. 1. PROJECT ... A pasta extruder is a machine that makes pasta dough through dies to.The process is quite simple and requires not much skilled labour. The machine itself is high technology and provides the manufacturers to produce noodles with. In this article, you will be able to get all the detail about a pasta-making business unit in India and the financial status of this business as well.ENGINEERS INDIA RESEARCH INSTITUTE - Service Provider of Project Report on PASTA PRODUCTION PLANT (SHORT PASTA) [CODE NO. 1632] based in Delhi, India.
Macaroni Machines are used to produce pasta from the raw material. With ... The views expressed in this Project Report are advisory in nature. SAMADHAN.
"Introduction to VLSI Design: Concepts and Applications"GtxDriver
油
This document offers a detailed exploration of VLSI (Very Large-Scale Integration) design principles, techniques, and applications. Topics include transistor-level design, digital circuit integration, and optimization strategies for modern electronics. Ideal for students, researchers, and professionals seeking a comprehensive guide to VLSI technology.
NFPA 70B & 70E Changes and Additions Webinar Presented By FlukeTranscat
油
Join us for this webinar about NFPA 70B & 70E changes and additions. NFPA 70B and NFPA 70E are both essential standards from the National Fire Protection Association (NFPA) that focus on electrical safety in the workplace. Both standards are critical for protecting workers, reducing the risk of electrical accidents, and ensuring compliance with safety regulations in industrial and commercial environments.
Fluke Sales Applications Manager Curt Geeting is presenting on this engaging topic:
Curt has worked for Fluke for 24 years. He currently is the Senior Sales Engineer in the NYC & Philadelphia Metro Markets. In total, Curt has worked 40 years in the industry consisting of 14 years in Test Equipment Distribution, 4+ years in Mfg. Representation, NAED Accreditation, Level 1 Thermographer, Level 1 Vibration Specialist, and Power Quality SME.
2. Probability distribution
Probability distribution is a function that gives the
likelihood of occurrence of all possible outcomes of an
experiment.
Categories: -
Discrete probability distribution
Continuous probability distribution
Functions used to describe a probability distribution: -
Probability mass function (Discrete)
Probability density function (Continuous)
A random variable is a variable that represents a numerical
outcome of a random experiment. Hence a probability
distribution function gives the probability of all the possible
values that a random variable can take.
Random variable may be discrete or continuous.
3. Why is probability distribution
significant?
They show all the possible values for a set of data and how often they
occur.
Distributions of data display the spread and shape of data
Helps in standardized comparisons/analysis.
Data exhibiting a defined distribution have predefined statistical
attributes
Mean = Median = Mode
4. Probability Distribution Function
The probability distribution function is also known as the
cumulative distribution function (CDF).
If there is a random variable, X, and its value is evaluated at
a point, x, then the probability distribution function gives
the probability that X will take a value lesser than or equal to
x. It can be written as
F(x) = P (X x)
Probability distribution function can be used for both discrete
and continuous variables.
5. Probability Distribution Function
(Example)
Let the random variable X represent the number of heads obtained in
two tosses of a coin.
Sample space: {HH, HT, TH, TT}
Probability distribution function:
Probability of obtaining less than/equal to one head,
P(X 1) = P(X = 0) + P (X = 1)
= 村 + 遜
= 他
No. of heads 0 1 2 Sum
PDF, P(X) 村 遜 村 1
6. Probability distribution of a
discrete random variable
A discrete random variable can be
defined as a variable that can take a
countable distinct value like 0, 1, 2, 3...
Probability Mass Function: p(x) = P(X =
x)
Probability Distribution Function: F(x) =
P (X x)
Examples of discrete probability
distribution: -
Binomial distribution
Bernoulli distribution
Poisson distribution
7. Probability distribution of a discrete random
variable
https://www.youtube.com/watch?v=YXLVjCKVP7U&ab_channel=zedstatistics
8. Probability Distribution of a
Continuous Random Variable
A continuous random variable can be
defined as a variable that can take on
infinitely many values.
The probability that a continuous random
variable will take on an exact value is 0.
Probability Distribution Function: F(x) = P (X
x)
Probability Density Function: f(x) = d/dx (F(x))
Examples of continuous probability
distribution: -
Normal distribution
Uniform distribution
Exponential distribution
10. Bernoulli Distribution
A Bernoulli distribution has only two possible outcomes, namely 1
(success) and 0 (failure), and a single trial.
The random variable X can take the following values: -
1 with the probability of success, p
0 with the probability of failure, q = 1 p
Probability mass function (PMF), P(x)
Expected value or mean = p
Variance = p.q
12. Binomial distribution
When multiple trials of an experiment that yields a
success/failure (Bernoulli distribution) is conducted, it exhibits a
binomial distribution.
PMF, P
where, n = number of trials
x = number of successes
p = probability of success
q = probability of failure
Expected value = n.p
Variance = n.p.q
13. Binomial distribution (Example)
A store manager estimates the probability of a customer making a
purchase as 0.30. What is the probability that two of the next three
customers will make a purchase?
Solution:
The above exhibits a binomial distribution as there are three customers ( 3
trials) with every customer either making a purchase (success) or not
making a purchase (failure).
Probability that two of the next three customers will make a purchase,
P
14. Normal distribution
In a normal distribution the data
tends to be around a central value
with no bias left or right.
Also called a bell curve as it looks
like a bell.
Many things follow a normal
distribution heights of people,
marks scored in a test.
15. Normal distribution
Mean = Median = Mode
68% of data lie within one standard deviation
95% of data lie within one standard deviation
https://www.mathsisfun.com/data/standard-n
ormal-distribution.html
16. Skewness
Negative skew: The long tail is on the negative side of the peak
Positive skew: The long tail is on the positive side of the peak
https://www.mathsisfun.com/data/skewness.html
17. Uniform distribution
In a Uniform Distribution there is an equal probability for all
values of the random variable between a and b.
18. Relationship between two variables
Covariance and correlation and are two statistical measures
that describe the relationship between two variables.
They both quantify how two variables change together, but
they differ in scale, interpretation, and units.
19. Covariance
Covariance measures the direction of the linear relationship between
two variables.
It tells you whether the variables move in the same direction (positive
covariance) or in opposite directions (negative covariance).
20. Covariance (Example)
Covariance between temperature and ice cream sales
Cov(X, Y) = 243
Positive value indicates a positive
correlation between temperature and ice
cream sales.
However, it does not specify the strength of
the relationship.
21. Correlation
Correlation measures both the strength and direction of the linear
relationship between two variables.
It lies within a within a standardized range.
1 perfect positive correlation
-1 perfect negative correlation
0 no correlation
Perfect
Positive
Correlation
24. Exploratory Data Analysis (EDA)
Exploratory Data Analysis refers to the critical process of performing initial
investigations on data so as to discover patterns, spot anomalies, test
hypothesis and to check assumptions with the help of summary statistics
and graphical representations.
Key Objectives of EDA:
Understand the data structure: Gain insights into the data's size, types,
and completeness.
Identify patterns: Detect trends, correlations, and groupings.
Find anomalies: Spot outliers and inconsistencies in the data.
Generate hypotheses: Form initial ideas for models, statistical testing, or
predictions.
Refine data: Clean, transform, or filter the data for further analysis.
25. Steps in EDA
1. Data loading and inspection
2. Univariate analysis
3. Bivariate analysis
4. Multivariate analysis
5. Identifying missing values and outliers
6. Data transformation
7. Feature engineering
8. Hypothesis engineering
26. Data loading and inspection
Step 1. Load data into the workspace
df.head() command displays the first few records
Step 2. Data preview and
summary
27. Univariate
analysis
Involves analyzing each
variable individually to
understand its distribution,
central tendency, and
spread.
Numerical variables:
histograms, box plots, and
summary statistics (mean,
median, standard
deviation)
Categorical variables: bar
charts, pie charts