際際滷

際際滷Share a Scribd company logo
EVALUATION OF PROGRAMS
CODES USING MACHINE
LEARNING
SUBMITTED BY 
MANAS CHHABRA
2K12/SE/041
ROHIT PAL 2K12/SE/
066
TANMAY AGGARWAL 2K12/SE/
OBJECTIVE
To detect copied codes submitted by
different users on online judges.
HOW CAN WE SAY WHETHER TWO CODES
ARE COPIED ?
 -Tradition Way
 Make a program which compare every code submitted by
user to every other code in the database.
THEN WHAT IS THE PROBLEM WITH THIS
METHOD
Assume that we have 4000 user to solve a given programming
question, assume that every code has 30 lines of codes
WE CAN RESOLVE THIS COMPLEXITY USING
MACHINE LEARNING
And Before we know how we have reduced this complexity
Lets first Know something about the machine learning concepts
which we have applied
WHAT IS MACHINE LEARNING?
 According to Arthur Samule , Machine Learning is a Field of
study that gives computers the ability to learn without being
explicitly programmed.
 According to Tom Mitchell it is a Well-posed Learning Problem:
A computer program is said to learn from experience E with
respect to some task T and some performance measure P , if its
Performance on T , as measured by P, improves with Experience
E.
MACHINE LEARNING ALGORITHMS
 Supervised Learning
 Unsupervised Learning
WE ARE GOING TO USE UNSUPERVISED
MACHINE LEARNING
Clustering : Learning from unlabled data.
Unsupervised learning
Try and determining structure in the data
Clustering algorithm groups data together based on data
features
WHAT IS CLUSTERING GOOD FOR
 Market segmentation - group customers into different market
segments
 Social network analysis - Facebook "smartlists"
 Organizing computer clusters and data centers for network
layout & location
 Astronomical data analysis - Understanding galaxy formation
K-MEANS ALGORITHM
 Used to automatically group the data into coherent clusters
 e.g. Assume for this unlabled data
 Step 1Randomly allocate two points as the cluster
centroidsCluster
 Step 2 Go through each example and depending on if it's closer
to the red or blue centroid assign each point to one of the two
clusters
 Step 3Move centroid step
 Take each centroid and move to the average of the
correspondingly assigned data-points
 Repeat Step 2 and Step 3 until convergence
MORE FORMAL DEFINITION
 INPUT:
 K (number of clusters in the data)
 Training set {x1, x2, x3 ..., xn)
 Algorithm
 Randomly initialize K cluster centroids as {亮1, 亮2, 亮3 ... 亮K}
DIMENSIONALITY REDUCTION
 Speeds up algorithms
 Reduces space used by data for them
 Reduce dimension from nD to mD
e.g. Reduction of 3D->2D
PRINCIPLE COMPONENT ANALYSIS (PCA)
 To reduce from nD to kD weFind k vectors (u(1), u(2), ... u(k)) onto
which to project the data to minimize the projection error
 So lots of vectors onto which we project the data
 Find a set of vectors which we project the data onto the linear
subspace spanned by that set of vectors
 We can define a point in a plane with k vectors
 e.g. 3D->2DFind pair of vectors which define a 2D plane
(surface) onto which you're going to project your data
 Much like the "shallow box" example in compression, we're
trying to create the shallowest box possible (by defining two of
it's three dimensions, so the box' depth is minimized)
ALGORITHM
 Reducing data from n-dimensional to k-dimensional
 Compute the covariance matrix
This is commonly denoted as
⇔ (greek upper case sigma) - NOT summation symbol
⇔ = sigma
This is an [n x n] matrix
Remember than xi is a [n x 1] matrix
In MATLAB or octave we can implement this as follows;
Compute eigenvectors of matrix 裡
[U,S,V] = svd(sigma)
svd = singular value decomposition
More numerically stable than eig
eig = also gives eigenvector
U,S and V are matrices
U matrix is also an [n x n] matrix
Turns out the columns of U are the u vectors we want!
So to reduce a system from n-dimensions to k-dimensions
Just take the first k-vectors from U (first k columns)
 Now if we need to reduce to k dimension
 Then we extract k columns from matrix U to Ureduce
NOW LETS IMPLEMENT THIS INFORMATION
TO OUR PROJECT
FEATURES TO BE CONSIDERED
- Detect type of file eg . .java, .cpp etc
- No. of Lines of Codes
-No. of Functions
-No. of Variables used
-No. of if-else conditions
-No. of loops
APPLICATIONS
 Beside detection of Cheating in Online programming contests
we can have following applications
 We can use this in our Programming labs to evaluate the
programs submitted by students according to complexity.
 We can know what programming style is in trend these days
THANK YOU !

More Related Content

What's hot (20)

KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)
Manish nath choudhary
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
Anna Fensel
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Mehrnaz Faraz
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Ryo Takahashi
Notes on Spectral Clustering
Notes on Spectral ClusteringNotes on Spectral Clustering
Notes on Spectral Clustering
Davide Eynard
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
SOYEON KIM
Variational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math BehindVariational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math Behind
Varun Reddy
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
爨爨迦爨迦Μ 爨萎鉦
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
Sanghyuk Chun
Connected components and shortest path
Connected components and shortest pathConnected components and shortest path
Connected components and shortest path
Kaushik Koneru
Daa unit 6_efficiency of algorithms
Daa unit 6_efficiency of algorithmsDaa unit 6_efficiency of algorithms
Daa unit 6_efficiency of algorithms
snehajiyani
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
Farah M. Altufaili
Training machine learning knn 2017
Training machine learning knn 2017Training machine learning knn 2017
Training machine learning knn 2017
Iwan Sofana
Tensor Spectral Clustering
Tensor Spectral ClusteringTensor Spectral Clustering
Tensor Spectral Clustering
Austin Benson
Parallel searching
Parallel searchingParallel searching
Parallel searching
Md. Mahedi Mahfuj
Support Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by StepSupport Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by Step
Manish nath choudhary
2021 01-04-learning filter-basis
2021 01-04-learning filter-basis2021 01-04-learning filter-basis
2021 01-04-learning filter-basis
JAEMINJEONG5
daa-unit-3-greedy method
daa-unit-3-greedy methoddaa-unit-3-greedy method
daa-unit-3-greedy method
hodcsencet
Parallel algorithm in linear algebra
Parallel algorithm in linear algebraParallel algorithm in linear algebra
Parallel algorithm in linear algebra
Harshana Madusanka Jayamaha
KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)
Manish nath choudhary
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
Anna Fensel
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Mehrnaz Faraz
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Ryo Takahashi
Notes on Spectral Clustering
Notes on Spectral ClusteringNotes on Spectral Clustering
Notes on Spectral Clustering
Davide Eynard
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
SOYEON KIM
Variational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math BehindVariational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math Behind
Varun Reddy
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
爨爨迦爨迦Μ 爨萎鉦
Connected components and shortest path
Connected components and shortest pathConnected components and shortest path
Connected components and shortest path
Kaushik Koneru
Daa unit 6_efficiency of algorithms
Daa unit 6_efficiency of algorithmsDaa unit 6_efficiency of algorithms
Daa unit 6_efficiency of algorithms
snehajiyani
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
Farah M. Altufaili
Training machine learning knn 2017
Training machine learning knn 2017Training machine learning knn 2017
Training machine learning knn 2017
Iwan Sofana
Tensor Spectral Clustering
Tensor Spectral ClusteringTensor Spectral Clustering
Tensor Spectral Clustering
Austin Benson
Support Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by StepSupport Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by Step
Manish nath choudhary
2021 01-04-learning filter-basis
2021 01-04-learning filter-basis2021 01-04-learning filter-basis
2021 01-04-learning filter-basis
JAEMINJEONG5
daa-unit-3-greedy method
daa-unit-3-greedy methoddaa-unit-3-greedy method
daa-unit-3-greedy method
hodcsencet

Similar to Evaluation of programs codes using machine learning (20)

Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
AminaRepo
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
36rajneekant
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
Darshak Mehta
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
Nandhini S
01 CS316_Introduction.pdf5959695559655565
01 CS316_Introduction.pdf595969555965556501 CS316_Introduction.pdf5959695559655565
01 CS316_Introduction.pdf5959695559655565
yahiaf3k
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Experfy
Design and analysis of algorithms-PRAM Alg
Design and analysis of algorithms-PRAM AlgDesign and analysis of algorithms-PRAM Alg
Design and analysis of algorithms-PRAM Alg
syamalamaganti
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
Hsing-chuan Hsieh
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]
Muhammad Hammad Waseem
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
Pyingkodi Maran
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
Madan Golla
DAA Notes.pdf
DAA Notes.pdfDAA Notes.pdf
DAA Notes.pdf
SauravPawar14
Fa18_P1.pptx
Fa18_P1.pptxFa18_P1.pptx
Fa18_P1.pptx
Md Abul Hayat
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
Md Abul Hayat
K MEANS CLUSTERING - UNSUPERVISED LEARNING
K MEANS CLUSTERING - UNSUPERVISED LEARNINGK MEANS CLUSTERING - UNSUPERVISED LEARNING
K MEANS CLUSTERING - UNSUPERVISED LEARNING
PalanivelG6
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lfMLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
1052LaxmanrajS
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomfMLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
1052LaxmanrajS
4 DM Clustering ifor computerscience.ppt
4 DM Clustering ifor computerscience.ppt4 DM Clustering ifor computerscience.ppt
4 DM Clustering ifor computerscience.ppt
arewho557
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
AminaRepo
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
36rajneekant
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
Darshak Mehta
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
Nandhini S
01 CS316_Introduction.pdf5959695559655565
01 CS316_Introduction.pdf595969555965556501 CS316_Introduction.pdf5959695559655565
01 CS316_Introduction.pdf5959695559655565
yahiaf3k
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Experfy
Design and analysis of algorithms-PRAM Alg
Design and analysis of algorithms-PRAM AlgDesign and analysis of algorithms-PRAM Alg
Design and analysis of algorithms-PRAM Alg
syamalamaganti
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
Hsing-chuan Hsieh
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]
Muhammad Hammad Waseem
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
Pyingkodi Maran
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
Madan Golla
K MEANS CLUSTERING - UNSUPERVISED LEARNING
K MEANS CLUSTERING - UNSUPERVISED LEARNINGK MEANS CLUSTERING - UNSUPERVISED LEARNING
K MEANS CLUSTERING - UNSUPERVISED LEARNING
PalanivelG6
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lfMLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
1052LaxmanrajS
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomfMLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
1052LaxmanrajS
4 DM Clustering ifor computerscience.ppt
4 DM Clustering ifor computerscience.ppt4 DM Clustering ifor computerscience.ppt
4 DM Clustering ifor computerscience.ppt
arewho557

Recently uploaded (20)

Information Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptxInformation Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptx
FrancisFayiah
ijcnlp04.....................................................
ijcnlp04.....................................................ijcnlp04.....................................................
ijcnlp04.....................................................
muhammadbsee749
Presentation_DM_applications for another services
Presentation_DM_applications for another servicesPresentation_DM_applications for another services
Presentation_DM_applications for another services
aldowilmeryapita
Epidemiology 009 Data collection tools .pdf
Epidemiology 009 Data collection tools .pdfEpidemiology 009 Data collection tools .pdf
Epidemiology 009 Data collection tools .pdf
oduroantiri
Statistics for Management - standard deviation.pptx
Statistics for Management - standard deviation.pptxStatistics for Management - standard deviation.pptx
Statistics for Management - standard deviation.pptx
Jeya Sree
IT Professional Ethics, Moral and Cu.ppt
IT Professional Ethics, Moral and Cu.pptIT Professional Ethics, Moral and Cu.ppt
IT Professional Ethics, Moral and Cu.ppt
FrancisFayiah
brightonSEO - Metehan Yesilyurt - Generative AI & GEO: the new SEO race and h...
brightonSEO - Metehan Yesilyurt - Generative AI & GEO: the new SEO race and h...brightonSEO - Metehan Yesilyurt - Generative AI & GEO: the new SEO race and h...
brightonSEO - Metehan Yesilyurt - Generative AI & GEO: the new SEO race and h...
Metehan Yeilyurt
Types_of_Data_Structures_Presentation.pptx
Types_of_Data_Structures_Presentation.pptxTypes_of_Data_Structures_Presentation.pptx
Types_of_Data_Structures_Presentation.pptx
shefalisharma776119
PPTjhjhghhhghghghggvgfggffgftftftftftft.ppt
PPTjhjhghhhghghghggvgfggffgftftftftftft.pptPPTjhjhghhhghghghggvgfggffgftftftftftft.ppt
PPTjhjhghhhghghghggvgfggffgftftftftftft.ppt
vmanjusundertamil21
Infection_Control_in_Dentistry_Presentation.pptx
Infection_Control_in_Dentistry_Presentation.pptxInfection_Control_in_Dentistry_Presentation.pptx
Infection_Control_in_Dentistry_Presentation.pptx
FadyAbedulAziz
Hadoop-and-R-Programming-Powering-Big-Data-Analytics.pptx
Hadoop-and-R-Programming-Powering-Big-Data-Analytics.pptxHadoop-and-R-Programming-Powering-Big-Data-Analytics.pptx
Hadoop-and-R-Programming-Powering-Big-Data-Analytics.pptx
MdTahammulNoor
A Simple Introduction to data Science- what is it and what does it do
A Simple Introduction to data Science- what is it and what does it doA Simple Introduction to data Science- what is it and what does it do
A Simple Introduction to data Science- what is it and what does it do
sarah mabrouk
BoSEU25 | Diego de J坦dar | Why User Activation is the Key to Sustainable Growth
BoSEU25 | Diego de J坦dar | Why User Activation is the Key to Sustainable GrowthBoSEU25 | Diego de J坦dar | Why User Activation is the Key to Sustainable Growth
BoSEU25 | Diego de J坦dar | Why User Activation is the Key to Sustainable Growth
Business of Software Conference
537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx
PorshaAbril1
GE-108-LESSON8.pptxbshsnsnsnsnsnsnnsnsnsnsnsbd
GE-108-LESSON8.pptxbshsnsnsnsnsnsnnsnsnsnsnsbdGE-108-LESSON8.pptxbshsnsnsnsnsnsnnsnsnsnsnsbd
GE-108-LESSON8.pptxbshsnsnsnsnsnsnnsnsnsnsnsbd
HarleySamboFavor
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
jimmy841199
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docx
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docxThreat Intelligence Platform_ The Future of Cybersecurity Defense.docx
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docx
dexposewebcast
cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...
cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...
cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...
soniaseo850
Turinton Insights - Enterprise Agentic AI Platform
Turinton Insights - Enterprise Agentic AI PlatformTurinton Insights - Enterprise Agentic AI Platform
Turinton Insights - Enterprise Agentic AI Platform
vikrant530668
chap2_nnejjejehhehehhhhhhhhhehslides.ppt
chap2_nnejjejehhehehhhhhhhhhehslides.pptchap2_nnejjejehhehehhhhhhhhhehslides.ppt
chap2_nnejjejehhehehhhhhhhhhehslides.ppt
Nikhil620181
Information Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptxInformation Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptx
FrancisFayiah
ijcnlp04.....................................................
ijcnlp04.....................................................ijcnlp04.....................................................
ijcnlp04.....................................................
muhammadbsee749
Presentation_DM_applications for another services
Presentation_DM_applications for another servicesPresentation_DM_applications for another services
Presentation_DM_applications for another services
aldowilmeryapita
Epidemiology 009 Data collection tools .pdf
Epidemiology 009 Data collection tools .pdfEpidemiology 009 Data collection tools .pdf
Epidemiology 009 Data collection tools .pdf
oduroantiri
Statistics for Management - standard deviation.pptx
Statistics for Management - standard deviation.pptxStatistics for Management - standard deviation.pptx
Statistics for Management - standard deviation.pptx
Jeya Sree
IT Professional Ethics, Moral and Cu.ppt
IT Professional Ethics, Moral and Cu.pptIT Professional Ethics, Moral and Cu.ppt
IT Professional Ethics, Moral and Cu.ppt
FrancisFayiah
brightonSEO - Metehan Yesilyurt - Generative AI & GEO: the new SEO race and h...
brightonSEO - Metehan Yesilyurt - Generative AI & GEO: the new SEO race and h...brightonSEO - Metehan Yesilyurt - Generative AI & GEO: the new SEO race and h...
brightonSEO - Metehan Yesilyurt - Generative AI & GEO: the new SEO race and h...
Metehan Yeilyurt
Types_of_Data_Structures_Presentation.pptx
Types_of_Data_Structures_Presentation.pptxTypes_of_Data_Structures_Presentation.pptx
Types_of_Data_Structures_Presentation.pptx
shefalisharma776119
PPTjhjhghhhghghghggvgfggffgftftftftftft.ppt
PPTjhjhghhhghghghggvgfggffgftftftftftft.pptPPTjhjhghhhghghghggvgfggffgftftftftftft.ppt
PPTjhjhghhhghghghggvgfggffgftftftftftft.ppt
vmanjusundertamil21
Infection_Control_in_Dentistry_Presentation.pptx
Infection_Control_in_Dentistry_Presentation.pptxInfection_Control_in_Dentistry_Presentation.pptx
Infection_Control_in_Dentistry_Presentation.pptx
FadyAbedulAziz
Hadoop-and-R-Programming-Powering-Big-Data-Analytics.pptx
Hadoop-and-R-Programming-Powering-Big-Data-Analytics.pptxHadoop-and-R-Programming-Powering-Big-Data-Analytics.pptx
Hadoop-and-R-Programming-Powering-Big-Data-Analytics.pptx
MdTahammulNoor
A Simple Introduction to data Science- what is it and what does it do
A Simple Introduction to data Science- what is it and what does it doA Simple Introduction to data Science- what is it and what does it do
A Simple Introduction to data Science- what is it and what does it do
sarah mabrouk
BoSEU25 | Diego de J坦dar | Why User Activation is the Key to Sustainable Growth
BoSEU25 | Diego de J坦dar | Why User Activation is the Key to Sustainable GrowthBoSEU25 | Diego de J坦dar | Why User Activation is the Key to Sustainable Growth
BoSEU25 | Diego de J坦dar | Why User Activation is the Key to Sustainable Growth
Business of Software Conference
537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx
PorshaAbril1
GE-108-LESSON8.pptxbshsnsnsnsnsnsnnsnsnsnsnsbd
GE-108-LESSON8.pptxbshsnsnsnsnsnsnnsnsnsnsnsbdGE-108-LESSON8.pptxbshsnsnsnsnsnsnnsnsnsnsnsbd
GE-108-LESSON8.pptxbshsnsnsnsnsnsnnsnsnsnsnsbd
HarleySamboFavor
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
jimmy841199
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docx
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docxThreat Intelligence Platform_ The Future of Cybersecurity Defense.docx
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docx
dexposewebcast
cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...
cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...
cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...
soniaseo850
Turinton Insights - Enterprise Agentic AI Platform
Turinton Insights - Enterprise Agentic AI PlatformTurinton Insights - Enterprise Agentic AI Platform
Turinton Insights - Enterprise Agentic AI Platform
vikrant530668
chap2_nnejjejehhehehhhhhhhhhehslides.ppt
chap2_nnejjejehhehehhhhhhhhhehslides.pptchap2_nnejjejehhehehhhhhhhhhehslides.ppt
chap2_nnejjejehhehehhhhhhhhhehslides.ppt
Nikhil620181

Evaluation of programs codes using machine learning

  • 1. EVALUATION OF PROGRAMS CODES USING MACHINE LEARNING SUBMITTED BY MANAS CHHABRA 2K12/SE/041 ROHIT PAL 2K12/SE/ 066 TANMAY AGGARWAL 2K12/SE/
  • 2. OBJECTIVE To detect copied codes submitted by different users on online judges.
  • 3. HOW CAN WE SAY WHETHER TWO CODES ARE COPIED ? -Tradition Way Make a program which compare every code submitted by user to every other code in the database.
  • 4. THEN WHAT IS THE PROBLEM WITH THIS METHOD Assume that we have 4000 user to solve a given programming question, assume that every code has 30 lines of codes
  • 5. WE CAN RESOLVE THIS COMPLEXITY USING MACHINE LEARNING And Before we know how we have reduced this complexity Lets first Know something about the machine learning concepts which we have applied
  • 6. WHAT IS MACHINE LEARNING? According to Arthur Samule , Machine Learning is a Field of study that gives computers the ability to learn without being explicitly programmed. According to Tom Mitchell it is a Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P , if its Performance on T , as measured by P, improves with Experience E.
  • 7. MACHINE LEARNING ALGORITHMS Supervised Learning Unsupervised Learning
  • 8. WE ARE GOING TO USE UNSUPERVISED MACHINE LEARNING Clustering : Learning from unlabled data. Unsupervised learning Try and determining structure in the data Clustering algorithm groups data together based on data features
  • 9. WHAT IS CLUSTERING GOOD FOR Market segmentation - group customers into different market segments Social network analysis - Facebook "smartlists" Organizing computer clusters and data centers for network layout & location Astronomical data analysis - Understanding galaxy formation
  • 10. K-MEANS ALGORITHM Used to automatically group the data into coherent clusters e.g. Assume for this unlabled data Step 1Randomly allocate two points as the cluster centroidsCluster Step 2 Go through each example and depending on if it's closer to the red or blue centroid assign each point to one of the two clusters
  • 11. Step 3Move centroid step Take each centroid and move to the average of the correspondingly assigned data-points Repeat Step 2 and Step 3 until convergence
  • 12. MORE FORMAL DEFINITION INPUT: K (number of clusters in the data) Training set {x1, x2, x3 ..., xn) Algorithm Randomly initialize K cluster centroids as {亮1, 亮2, 亮3 ... 亮K}
  • 13. DIMENSIONALITY REDUCTION Speeds up algorithms Reduces space used by data for them Reduce dimension from nD to mD e.g. Reduction of 3D->2D
  • 14. PRINCIPLE COMPONENT ANALYSIS (PCA) To reduce from nD to kD weFind k vectors (u(1), u(2), ... u(k)) onto which to project the data to minimize the projection error So lots of vectors onto which we project the data Find a set of vectors which we project the data onto the linear subspace spanned by that set of vectors We can define a point in a plane with k vectors
  • 15. e.g. 3D->2DFind pair of vectors which define a 2D plane (surface) onto which you're going to project your data Much like the "shallow box" example in compression, we're trying to create the shallowest box possible (by defining two of it's three dimensions, so the box' depth is minimized)
  • 16. ALGORITHM Reducing data from n-dimensional to k-dimensional Compute the covariance matrix This is commonly denoted as ⇔ (greek upper case sigma) - NOT summation symbol ⇔ = sigma This is an [n x n] matrix Remember than xi is a [n x 1] matrix In MATLAB or octave we can implement this as follows;
  • 17. Compute eigenvectors of matrix 裡 [U,S,V] = svd(sigma) svd = singular value decomposition More numerically stable than eig eig = also gives eigenvector U,S and V are matrices U matrix is also an [n x n] matrix Turns out the columns of U are the u vectors we want! So to reduce a system from n-dimensions to k-dimensions Just take the first k-vectors from U (first k columns)
  • 18. Now if we need to reduce to k dimension Then we extract k columns from matrix U to Ureduce
  • 19. NOW LETS IMPLEMENT THIS INFORMATION TO OUR PROJECT
  • 20. FEATURES TO BE CONSIDERED - Detect type of file eg . .java, .cpp etc - No. of Lines of Codes -No. of Functions -No. of Variables used -No. of if-else conditions -No. of loops
  • 21. APPLICATIONS Beside detection of Cheating in Online programming contests we can have following applications We can use this in our Programming labs to evaluate the programs submitted by students according to complexity. We can know what programming style is in trend these days