The document discusses machine learning techniques for finding patterns in data and using those patterns to make predictions. It covers topics like classification algorithms, decision trees, neural networks, learning as a search process, and how machine learning systems use bias to avoid overfitting training data. Examples are provided on classifying weather data to determine if a baseball game should be played, classifying iris flowers, predicting CPU performance, and diagnosing soybean diseases.
The document discusses machine learning techniques for finding patterns in data. It covers classification algorithms like decision trees and neural networks that can predict outcomes for new data based on patterns learned from training examples. The document also discusses concepts like bias, which refers to the assumptions built into machine learning algorithms that guide their search for patterns and prevent overfitting to noise in the training data. Examples are provided to illustrate classification problems and solutions like rules learned to predict gameplay based on weather conditions.
The document discusses machine learning concepts related to classification, including linear regression, decision trees, and neural networks. It provides an example of using weather data to classify whether a game will be played or not based on attributes like temperature and humidity. Rules are generated to make the classification based on patterns in the data.
The document provides an overview of concepts related to computing for bioinformatics including machine learning, data mining, knowledge discovery, statistics, databases, and data visualization. It discusses techniques like classification, clustering, association rule mining, and anomaly detection. It also presents examples of applying these techniques to problems like weather prediction, contact lens recommendation, and soybean disease diagnosis.
This document provides an overview of data mining and machine learning techniques. It discusses how machine learning can be used to extract patterns and insights from large amounts of data. Examples are provided of applications in domains like credit lending, oil spill detection, load forecasting, and machine fault diagnosis. The document also covers key concepts in data mining like the generalization process, bias, and some ethical considerations around using these techniques.
This document provides information and expectations for students completing a work placement as part of their early years education qualification. It outlines that students will spend 400 hours over 2 years gaining real work experience working with children from birth to 7 years old. The placement objectives are to gain experience in the roles and responsibilities of childcare practitioners, develop communication and teamwork skills, and gain insight into their career path. The document details the student code of conduct which includes expectations around attendance, attitude, safeguarding, confidentiality, and more. It provides guidance for students on preparing for and conducting themselves during their placement.
UC Library¡¯s specialised services for postgrads; key chemistry information resources and tools; database tips and tricks; keeping up-to-date; and using Endnote to make citations a breeze.
Using Machine Learning in Networks Intrusion Detection SystemsOmar Shaya
?
The internet and different computing devices from desktop computers to smartphones have raised many security and privacy concerns, and the need to automate systems that detect attacks on these networks has emerged in order to be able to protect these networks with scale. And while traditional intrusion detection methods may be able to detect previously known attacks, the issue of dealing with new unknown attacks arises and that brings machine learning as a strong candidate to solve these challenges.
In this report, we investigate the use of machine learning in detecting network attacks, intrusion detection, by looking at work that has been done in this field. Particularly we look at the work that has been done by Pasocal et al.
The document summarizes a study that examined teenagers' views on masculine and feminine personality traits. Male participants were more likely than females to label traits as stereotypically masculine or feminine. However, both males and females labeled many traits as neutral, contrary to the hypothesis. This may indicate that younger generations are moving away from strict gender stereotypes. Repeating the study with a larger, more diverse sample could provide more insights into generational differences in views of masculinity and femininity.
The document provides an overview of a presentation on types of research given by Manoj Patel. It defines research and lists its main objectives as extending knowledge, revealing hidden facts, generalizing laws, and verifying existing theories and facts. The presentation then describes several common types of research, including descriptive and analytical research, applied and fundamental research, quantitative and qualitative research, conceptual and empirical research, and others. It provides examples to illustrate the differences between each type.
This research presentation compares two camera options and makes a recommendation. It outlines the purpose and methodology of the study, including the data sources and a decision matrix. The results are presented by showing the strengths and weaknesses of each camera option, and a comparison matrix informs the final recommendation of which camera is best and why.
The document discusses research design, which is a framework that specifies the procedures needed to structure and solve a research problem. It defines the information required and outlines measurement, sampling, data collection, and analysis plans. The document compares exploratory, descriptive, and causal research designs and cross-sectional vs longitudinal studies. Key factors like objectives, characteristics, findings, and outcomes are contrasted for different design types. Common errors in research are also outlined.
This document provides an overview of different types of research designs, including exploratory, descriptive, diagnostic, and hypothesis-testing designs. It defines what a research design is and lists key features of a good research design such as minimizing bias. For each type of design, it provides a brief definition and highlights important aspects to consider, such as the objective, data collection methods, sample selection, and data analysis. The overall purpose is to introduce and compare different approaches to research design.
Literature Review (Review of Related Literature - Research Methodology)Dilip Barad
?
Literature Review or Review of Related Literature is one of the most vital stages in any research. This presentation attempts to throw some light on the process and important aspects of literature review.
This document provides an overview of key concepts in research methodology, including:
1. It defines research as an organized and systematic process of finding answers to questions through a defined set of steps and procedures.
2. It discusses different types of research including quantitative, qualitative, basic, applied, longitudinal, descriptive, classification, comparative, exploratory, explanatory, causal, theory testing, and theory building research.
3. It also discusses alternatives to research-based knowledge such as relying on authority, tradition, common sense, media, and personal experience.
This document provides an overview of data mining and machine learning concepts. It discusses the data mining process, examples of data mining applications in areas like loan approval, image analysis, and load forecasting. It also covers machine learning techniques like decision trees and rules. Additionally, it addresses topics such as the role of domain knowledge, generalization as a search problem, and biases that influence machine learning models.
This document discusses classification, which is a type of supervised machine learning where algorithms are used to predict categorical class labels. There is a two-step process: 1) model construction using a training dataset to develop rules or formulas for classification, and 2) model usage to classify new data. Common applications include credit approval, target marketing, medical diagnosis, and treatment effectiveness analysis. The document also covers Bayesian classification, which uses probability distributions over class labels to classify new data instances based on attribute values and their probabilities.
The document describes the process of constructing decision trees. It begins with an example weather dataset and shows how to build a decision tree to predict whether to play or not based on attributes like outlook, temperature, etc. It then discusses the key steps in constructing decision trees which include selecting the best attribute to split on at each node based on information gain. It also discusses overfitting and the need for tree pruning. The document provides formulas to calculate information gain and discusses strategies like using a chi-squared test to select statistically robust splits during tree construction.
The document discusses covering (rule-based) algorithms for generating classification rules from data. It provides an example of using a simple covering algorithm to iteratively generate rules that assign contact lens recommendations based on patient attributes. The algorithm works by selecting the test at each step that best separates the data (maximizes accuracy) until all instances are covered by rules or no further separation is possible.
The document discusses the classification and regression tree (CART) algorithm. It provides details on how CART builds decision trees using a greedy algorithm that recursively splits nodes based on thresholds of predictor variables. CART uses the Gini index criterion to find the optimal splits that result in homogenous subsets. An example is provided to demonstrate how CART constructs a decision tree to classify examples based on various predictor variables like outlook, temperature, humidity, and wind.
This document provides an introduction to machine learning and decision trees. It defines key concepts like deep learning, artificial intelligence, and machine learning. It then discusses different machine learning algorithms like supervised learning, unsupervised learning, and decision trees. The document explains how decision trees are built by choosing features to split on at each node based on metrics like information gain and entropy. It provides an example of calculating entropy and information gain to select the best feature to split the root node on.
Three main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning involves training a model using labeled input/output data where the desired outputs are provided, allowing the model to map inputs to outputs. Unsupervised learning involves discovering hidden patterns in unlabeled data and grouping similar data points together. Reinforcement learning involves an agent learning through trial-and-error interactions with a dynamic environment by receiving rewards or punishments for actions.
This document discusses techniques for teaching decision-making skills to students. It provides examples of how to identify opportunities for students to make decisions within lesson content by responding to problems, creating new problems, or identifying opportunities. The document also outlines specific techniques students can use to make decisions, such as using a PMI chart, weighted sums, decision trees, and Edward de Bono's Six Thinking Hats model. The overall goal is to scaffold the learning of decision-making skills by giving students practice applying techniques in contextual examples rather than isolation.
This document discusses techniques for teaching decision-making skills to students. It provides examples of how to identify opportunities for students to make decisions within lesson content by responding to problems, creating new problems, or identifying opportunities. The document also outlines specific techniques students can use to make decisions, such as using a PMI chart, weighted sums, decision trees, and Edward de Bono's Six Thinking Hats model. The overall goal is to scaffold the learning of decision-making skills by giving students practice applying techniques in contextual examples rather than isolation.
This presentation covers Decision Tree as a supervised machine learning technique, talking about Information Gain method and Gini Index method with their related Algorithms.
The document discusses machine learning and different types of learning problems. It begins by explaining that machine learning allows systems to learn knowledge that engineers may not know how to provide. It then describes several types of learning including supervised learning (prediction from labeled examples), clustering (finding natural groupings of unlabeled data), and reinforcement learning (learning from rewards/penalties). The document provides examples of different learning problems and algorithms like decision trees. It emphasizes that the goal of learning is to find patterns in data and make accurate predictions, especially for previously unseen examples.
The document provides an overview of machine learning and decision tree learning. It discusses how machine learning can be applied to problems that are too difficult to program by hand, such as autonomous driving. It then describes decision tree learning, including how decision trees work, how the ID3 algorithm builds decision trees in a top-down manner by selecting the attribute that best splits the data at each step, and how decision trees can be converted to rules.
This document discusses probability rules and models. It begins by defining key terms like probability model, sample space, and event. It then presents the formula for calculating probabilities when outcomes are equally likely. Several basic probability rules are covered, including that a probability must be between 0 and 1 and the complement and addition rules. Examples are provided to demonstrate how to calculate probabilities and use two-way tables and Venn diagrams to find probabilities involving two events. The general addition rule for calculating P(A or B) is also explained.
The document provides an overview of a presentation on types of research given by Manoj Patel. It defines research and lists its main objectives as extending knowledge, revealing hidden facts, generalizing laws, and verifying existing theories and facts. The presentation then describes several common types of research, including descriptive and analytical research, applied and fundamental research, quantitative and qualitative research, conceptual and empirical research, and others. It provides examples to illustrate the differences between each type.
This research presentation compares two camera options and makes a recommendation. It outlines the purpose and methodology of the study, including the data sources and a decision matrix. The results are presented by showing the strengths and weaknesses of each camera option, and a comparison matrix informs the final recommendation of which camera is best and why.
The document discusses research design, which is a framework that specifies the procedures needed to structure and solve a research problem. It defines the information required and outlines measurement, sampling, data collection, and analysis plans. The document compares exploratory, descriptive, and causal research designs and cross-sectional vs longitudinal studies. Key factors like objectives, characteristics, findings, and outcomes are contrasted for different design types. Common errors in research are also outlined.
This document provides an overview of different types of research designs, including exploratory, descriptive, diagnostic, and hypothesis-testing designs. It defines what a research design is and lists key features of a good research design such as minimizing bias. For each type of design, it provides a brief definition and highlights important aspects to consider, such as the objective, data collection methods, sample selection, and data analysis. The overall purpose is to introduce and compare different approaches to research design.
Literature Review (Review of Related Literature - Research Methodology)Dilip Barad
?
Literature Review or Review of Related Literature is one of the most vital stages in any research. This presentation attempts to throw some light on the process and important aspects of literature review.
This document provides an overview of key concepts in research methodology, including:
1. It defines research as an organized and systematic process of finding answers to questions through a defined set of steps and procedures.
2. It discusses different types of research including quantitative, qualitative, basic, applied, longitudinal, descriptive, classification, comparative, exploratory, explanatory, causal, theory testing, and theory building research.
3. It also discusses alternatives to research-based knowledge such as relying on authority, tradition, common sense, media, and personal experience.
This document provides an overview of data mining and machine learning concepts. It discusses the data mining process, examples of data mining applications in areas like loan approval, image analysis, and load forecasting. It also covers machine learning techniques like decision trees and rules. Additionally, it addresses topics such as the role of domain knowledge, generalization as a search problem, and biases that influence machine learning models.
This document discusses classification, which is a type of supervised machine learning where algorithms are used to predict categorical class labels. There is a two-step process: 1) model construction using a training dataset to develop rules or formulas for classification, and 2) model usage to classify new data. Common applications include credit approval, target marketing, medical diagnosis, and treatment effectiveness analysis. The document also covers Bayesian classification, which uses probability distributions over class labels to classify new data instances based on attribute values and their probabilities.
The document describes the process of constructing decision trees. It begins with an example weather dataset and shows how to build a decision tree to predict whether to play or not based on attributes like outlook, temperature, etc. It then discusses the key steps in constructing decision trees which include selecting the best attribute to split on at each node based on information gain. It also discusses overfitting and the need for tree pruning. The document provides formulas to calculate information gain and discusses strategies like using a chi-squared test to select statistically robust splits during tree construction.
The document discusses covering (rule-based) algorithms for generating classification rules from data. It provides an example of using a simple covering algorithm to iteratively generate rules that assign contact lens recommendations based on patient attributes. The algorithm works by selecting the test at each step that best separates the data (maximizes accuracy) until all instances are covered by rules or no further separation is possible.
The document discusses the classification and regression tree (CART) algorithm. It provides details on how CART builds decision trees using a greedy algorithm that recursively splits nodes based on thresholds of predictor variables. CART uses the Gini index criterion to find the optimal splits that result in homogenous subsets. An example is provided to demonstrate how CART constructs a decision tree to classify examples based on various predictor variables like outlook, temperature, humidity, and wind.
This document provides an introduction to machine learning and decision trees. It defines key concepts like deep learning, artificial intelligence, and machine learning. It then discusses different machine learning algorithms like supervised learning, unsupervised learning, and decision trees. The document explains how decision trees are built by choosing features to split on at each node based on metrics like information gain and entropy. It provides an example of calculating entropy and information gain to select the best feature to split the root node on.
Three main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning involves training a model using labeled input/output data where the desired outputs are provided, allowing the model to map inputs to outputs. Unsupervised learning involves discovering hidden patterns in unlabeled data and grouping similar data points together. Reinforcement learning involves an agent learning through trial-and-error interactions with a dynamic environment by receiving rewards or punishments for actions.
This document discusses techniques for teaching decision-making skills to students. It provides examples of how to identify opportunities for students to make decisions within lesson content by responding to problems, creating new problems, or identifying opportunities. The document also outlines specific techniques students can use to make decisions, such as using a PMI chart, weighted sums, decision trees, and Edward de Bono's Six Thinking Hats model. The overall goal is to scaffold the learning of decision-making skills by giving students practice applying techniques in contextual examples rather than isolation.
This document discusses techniques for teaching decision-making skills to students. It provides examples of how to identify opportunities for students to make decisions within lesson content by responding to problems, creating new problems, or identifying opportunities. The document also outlines specific techniques students can use to make decisions, such as using a PMI chart, weighted sums, decision trees, and Edward de Bono's Six Thinking Hats model. The overall goal is to scaffold the learning of decision-making skills by giving students practice applying techniques in contextual examples rather than isolation.
This presentation covers Decision Tree as a supervised machine learning technique, talking about Information Gain method and Gini Index method with their related Algorithms.
The document discusses machine learning and different types of learning problems. It begins by explaining that machine learning allows systems to learn knowledge that engineers may not know how to provide. It then describes several types of learning including supervised learning (prediction from labeled examples), clustering (finding natural groupings of unlabeled data), and reinforcement learning (learning from rewards/penalties). The document provides examples of different learning problems and algorithms like decision trees. It emphasizes that the goal of learning is to find patterns in data and make accurate predictions, especially for previously unseen examples.
The document provides an overview of machine learning and decision tree learning. It discusses how machine learning can be applied to problems that are too difficult to program by hand, such as autonomous driving. It then describes decision tree learning, including how decision trees work, how the ID3 algorithm builds decision trees in a top-down manner by selecting the attribute that best splits the data at each step, and how decision trees can be converted to rules.
This document discusses probability rules and models. It begins by defining key terms like probability model, sample space, and event. It then presents the formula for calculating probabilities when outcomes are equally likely. Several basic probability rules are covered, including that a probability must be between 0 and 1 and the complement and addition rules. Examples are provided to demonstrate how to calculate probabilities and use two-way tables and Venn diagrams to find probabilities involving two events. The general addition rule for calculating P(A or B) is also explained.
This document discusses probability rules and models. It begins by defining key terms like probability model, sample space, and event. It then presents the formula for calculating probabilities when outcomes are equally likely. Several basic probability rules are covered, including that a probability must be between 0 and 1 and the complement and addition rules. Examples are provided to demonstrate how to calculate probabilities and use two-way tables and Venn diagrams to find probabilities involving two events. The general addition rule for calculating P(A or B) is also explained.
This document discusses probability rules and models. It begins by defining key terms like probability model, sample space, and event. It then presents the formula for calculating probabilities when outcomes are equally likely. Several basic probability rules are covered, including that a probability must be between 0 and 1 and the complement and addition rules. Examples are provided to demonstrate how to calculate probabilities and use two-way tables and Venn diagrams to find probabilities involving two events. The general addition rule for calculating P(A or B) is also explained.
Este documento analiza el modelo de negocio de YouTube. Explica que YouTube y otros sitios de video online representan un nuevo modelo de negocio para contenidos audiovisuales debido al cambio en los h¨¢bitos de consumo causado por las nuevas tecnolog¨ªas. Describe c¨®mo YouTube aprovecha la participaci¨®n de los usuarios para mejorar continuamente y atraer una audiencia diferente a la de los medios tradicionales.
The defense was successful in portraying Michael Jackson favorably to the jury in several ways:
1) They dressed Jackson in ornate costumes that conveyed images of purity, innocence, and humility.
2) Jackson was shown entering the courtroom as if on a red carpet, emphasizing his celebrity status.
3) Jackson appeared vulnerable, childlike, and in declining health during the trial, eliciting sympathy from jurors.
4) Defense attorney Tom Mesereau effectively presented a coherent narrative of Jackson as a victim and portrayed Neverland as a place of refuge, undermining the prosecution's arguments.
Michael Jackson was born in 1958 in Gary, Indiana and rose to fame in the 1960s as the lead singer of The Jackson 5, topping music charts in the 1970s. As a solo artist in the 1980s, his album Thriller broke music records. In the 1990s and 2000s, Jackson faced several legal issues related to child abuse allegations while continuing to release music. He married Lisa Marie Presley and Debbie Rowe and had two children before his death in 2009.
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
?
This document appears to be a list of popular books from various authors. It includes over 150 book titles across many genres such as fiction, non-fiction, memoirs, and novels. The books cover a wide range of topics from politics to cooking to autobiographies.
The prosecution lost the Michael Jackson trial due to several key mistakes and weaknesses in their case:
1) The lead prosecutor, Thomas Sneddon, was too personally invested in the case against Jackson, having pursued him for over a decade without success.
2) Sneddon's opening statement was disorganized and weak, failing to effectively outline the prosecution's case.
3) The accuser's mother was not credible and damaged the prosecution's case through her erratic testimony, history of lies and con artist behavior.
4) Many prosecution witnesses were not credible due to prior lawsuits against Jackson, debts owed to him, or having been fired by him. Several witnesses even took the Fifth Amendment.
Here are three examples of public relations from around the world:
1. The UK government's "Be Clear on Cancer" campaign which aims to raise awareness of cancer symptoms and encourage early diagnosis.
2. Samsung's global brand marketing and sponsorship activities which aim to increase brand awareness and favorability of Samsung products worldwide.
3. The Brazilian government's efforts to improve its international image and relations with other countries through strategic communication and diplomacy.
The three most important functions of public relations are:
1. Media relations because the media is how most organizations reach their key audiences. Strong media relationships are crucial.
2. Writing, because written communication is at the core of public relations and how most information is
Michael Jackson Please Wait... provides biographical information about Michael Jackson including his birthdate, birthplace, parents, height, interests, idols, favorite foods, films, and more. It discusses his background, career highlights including influential albums like Thriller, and films he appeared in such as The Wiz and Moonwalker. The document contains photos and details about Jackson's life and illustrious music career.
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
?
The document discusses the process of manufacturing celebrity and its negative byproducts. It argues that celebrities are rarely the best in their individual pursuits like singing, dancing, etc. but become famous due to being products of a system controlled by wealthy elites. This system stifles opportunities for worthy artists and creates feudalism. The document also asserts that manufactured celebrities should not be viewed as role models due to behaviors like drug abuse and narcissism that result from the celebrity-making process.
Michael Jackson was a child star who rose to fame with the Jackson 5 in the late 1960s and early 1970s. As a solo artist in the 1970s and 1980s, he had immense commercial success with albums like Off the Wall, Thriller, and Bad, which featured hit singles and groundbreaking music videos. However, his career and public image were plagued by controversies related to allegations of child sexual abuse in the 1990s and 2000s. He continued recording and performing but faced ongoing media scrutiny into his private life until his death in 2009.
Social Networks: Twitter Facebook SL - ºÝºÝߣ 1butest
?
The document discusses using social networking tools like Twitter and Facebook in K-12 education. Twitter allows students and teachers to share short updates and can be used to give parents a window into classroom activities. Facebook allows targeted advertising that could be used to promote educational activities. Both tools could help facilitate communication between schools and communities if used properly while managing privacy and security concerns.
Facebook has over 300 million active users who log on daily, and allows brands to create public profile pages to interact with users. Pages are for brands and organizations only, while groups can be made by any user about any topic. Pages do not show admin names and have no limits on fans, while groups display admin names and are limited to 5,000 members. Content on pages should aim to provoke action from subscribers and establish a regular posting schedule using a conversational tone.
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
?
Hare Chevrolet is a car dealership located in Noblesville, Indiana that has successfully used social media platforms like Twitter, Facebook, and YouTube to create a positive brand image. They invest significant time interacting directly with customers online to foster a sense of community rather than overtly advertising. As a result, Hare Chevrolet has built a large, engaged audience on social media and serves as a model for how brands can use online presences strategically.
Welcome to the Dougherty County Public Library's Facebook and ...butest
?
This document provides instructions for signing up for Facebook and Twitter accounts. It outlines the sign up process for both platforms, including filling out forms with name, email, password and other details. It describes how the platforms will then search for friends and suggest people to connect with. It also explains how to search for and follow the Dougherty County Public Library page on both Facebook and Twitter once signed up. The document concludes by thanking participants and providing a contact for any additional questions.
Paragon Software announces the release of Paragon NTFS for Mac OS X 8.0, which provides full read and write access to NTFS partitions on Macs. It is the fastest NTFS driver on the market, achieving speeds comparable to native Mac file systems. Paragon NTFS for Mac 8.0 fully supports the latest Mac OS X Snow Leopard operating system in 64-bit mode and allows easy transfer of files between Windows and Mac partitions without additional hardware or software.
This document provides compatibility information for Olympus digital products used with Macintosh OS X. It lists various digital cameras, photo printers, voice recorders, and accessories along with their connection type and any notes on compatibility. Some products require booting into OS 9.1 for software compatibility or do not support devices that need a serial port. Drivers and software are available for download from Olympus and other websites for many products to enable use with OS X.
To use printers managed by the university's Information Technology Services (ITS), students and faculty must install the ITS Remote Printing software on their Mac OS X computer. This allows them to add network printers, log in with their ITS account credentials, and print documents while being charged per page to funds in their pre-paid ITS account. The document provides step-by-step instructions for installing the software, adding a network printer, and printing to that printer from any internet connection on or off campus. It also explains the pay-in-advance printing payment system and how to check printing charges.
The document provides an overview of the Mac OS X user interface for beginners, including descriptions of the desktop, login screen, desktop elements like the dock and hard disk, and how to perform common tasks like opening files and folders. It also addresses frequently asked questions for Windows users switching to Mac OS X, such as where documents are stored, how to save or find documents, and what the equivalent of the C: drive is in Mac OS X. The document concludes with sections on file management tasks like creating and deleting folders, organizing files within applications, using Spotlight search, and an overview of the Dashboard feature.
This document provides a checklist for securing Mac OS X version 10.5, focusing on hardening the operating system, securing user accounts and administrator accounts, enabling file encryption and permissions, implementing intrusion detection, and maintaining password security. It describes the Unix infrastructure and security framework that Mac OS X is built on, leveraging open source software and following the Common Data Security Architecture model. The checklist can be used to audit a system or harden it against security threats.
This document summarizes a course on web design that was piloted in the summer of 2003. The course was a 3 credit course that met 4 times a week for lectures and labs. It covered topics such as XHTML, CSS, JavaScript, Photoshop, and building a basic website. 18 students from various majors enrolled. Student and instructor evaluations found the course to be very successful overall, though some improvements were suggested like ensuring proper software and pairing programming/non-programming students. The document also discusses implications of incorporating web design material into existing computer science curriculums.
3. Finding patterns Goal: programs that detect patterns and regularities in the data Strong patterns ? good predictions Problem 1: most patterns are not interesting Problem 2: patterns may be inexact (or spurious) Problem 3: data may be garbled or missing
4. Machine learning techniques Algorithms for acquiring structural descriptions from examples Structural descriptions represent patterns explicitly Can be used to predict outcome in new situation Can be used to understand and explain how prediction is derived ( may be even more important ) Methods originate from artificial intelligence, statistics, and research on databases witten&eibe
5. Can machines really learn? Definitions of ¡°learning¡± from dictionary: To get knowledge of by study, experience, or being taught To become aware by information or from observation To commit to memory To be informed of, ascertain; to receive instruction witten&eibe Difficult to measure Trivial for computers Things learn when they change their behavior in a way that makes them perform better in the future. Operational definition: Does a slipper learn? Does learning imply intention?
6. Classification Learn a method for predicting the instance class from pre-labeled (classified) instances Many approaches: Regression, Decision Trees, Bayesian, Neural Networks, ... Given a set of points from classes what is the class of new point ?
7. Classification: Linear Regression Linear Regression w 0 + w 1 x + w 2 y >= 0 Regression computes w i from data to minimize squared error to ¡®fit¡¯ the data Not flexible enough
11. The weather problem Given past data, Can you come up with the rules for Play/Not Play ? What is the game? no true high mild rainy yes false normal hot overcast yes true high mild overcast yes true normal mild sunny yes false normal mild rainy yes false normal mild sunny no false high mild sunny yes true normal mild overcast no true normal mild rainy yes false normal mild rainy yes false high mild rainy yes false high hot overcast no true high hot sunny no false high hot sunny Play Windy Humidity Temperature Outlook
12. The weather problem Given this data, what are the rules for play/not play? ¡ ¡ ¡ ¡ ¡ Yes False Normal Mild Rainy Yes False High Hot Overcast No True High Hot Sunny No False High Hot Sunny Play Windy Humidity Temperature Outlook
13. The weather problem Conditions for playing witten&eibe ¡ ¡ ¡ ¡ ¡ Yes False Normal Mild Rainy Yes False High Hot Overcast No True High Hot Sunny No False High Hot Sunny Play Windy Humidity Temperature Outlook If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes
14. Weather data with mixed attributes no true 91 71 rainy yes false 75 81 overcast yes true 90 72 overcast yes true 70 75 sunny yes false 80 75 rainy yes false 70 69 sunny no false 95 72 sunny yes true 65 64 overcast no true 70 65 rainy yes false 80 68 rainy yes false 96 70 rainy yes false 86 83 overcast no true 90 80 sunny no false 85 85 sunny Play Windy Humidity Temperature Outlook
15. Weather data with mixed attributes How will the rules change when some attributes have numeric values? ¡ ¡ ¡ ¡ ¡ Yes False 80 75 Rainy Yes False 86 83 Overcast No True 90 80 Sunny No False 85 85 Sunny Play Windy Humidity Temperature Outlook
16. Weather data with mixed attributes Rules with mixed attributes witten&eibe ¡ ¡ ¡ ¡ ¡ Yes False 80 75 Rainy Yes False 86 83 Overcast No True 90 80 Sunny No False 85 85 Sunny Play Windy Humidity Temperature Outlook If outlook = sunny and humidity > 83 then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity < 85 then play = yes If none of the above then play = yes
17. The contact lenses data witten&eibe None Reduced Yes Hypermetrope Pre-presbyopic None Normal Yes Hypermetrope Pre-presbyopic None Reduced No Myope Presbyopic None Normal No Myope Presbyopic None Reduced Yes Myope Presbyopic Hard Normal Yes Myope Presbyopic None Reduced No Hypermetrope Presbyopic Soft Normal No Hypermetrope Presbyopic None Reduced Yes Hypermetrope Presbyopic None Normal Yes Hypermetrope Presbyopic Soft Normal No Hypermetrope Pre-presbyopic None Reduced No Hypermetrope Pre-presbyopic Hard Normal Yes Myope Pre-presbyopic None Reduced Yes Myope Pre-presbyopic Soft Normal No Myope Pre-presbyopic None Reduced No Myope Pre-presbyopic hard Normal Yes Hypermetrope Young None Reduced Yes Hypermetrope Young Soft Normal No Hypermetrope Young None Reduced No Hypermetrope Young Hard Normal Yes Myope Young None Reduced Yes Myope Young Soft Normal No Myope Young None Reduced No Myope Young Recommended lenses Tear production rate Astigmatism Spectacle prescription Age
18. A complete and correct rule set witten&eibe If tear production rate = reduced then recommendation = none If age = young and astigmatic = no and tear production rate = normal then recommendation = soft If age = pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft If age = presbyopic and spectacle prescription = myope and astigmatic = no then recommendation = none If spectacle prescription = hypermetrope and astigmatic = no and tear production rate = normal then recommendation = soft If spectacle prescription = myope and astigmatic = yes and tear production rate = normal then recommendation = hard If age young and astigmatic = yes and tear production rate = normal then recommendation = hard If age = pre-presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none If age = presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none
22. Soybean classification witten&eibe Diaporthe stem canker 19 Diagnosis Normal 3 Condition Roots ¡ Yes 2 Stem lodging Abnormal 2 Condition Stem ¡ ? 3 Leaf spot size Abnormal 2 Condition Leaves ? 5 Fruit spots Normal 4 Condition of fruit pods Fruit ¡ Absent 2 Mold growth Normal 2 Condition Seed ¡ Above normal 3 Precipitation July 7 Time of occurrence Environment Sample value Number of values Attribute
23. The role of domain knowledge But in this domain, ¡°leaf condition is normal¡± implies ¡°leaf malformation is absent¡±! witten&eibe If leaf condition is normal and stem condition is abnormal and stem cankers is below soil line and canker lesion color is brown then diagnosis is rhizoctonia root rot If leaf malformation is absent and stem condition is abnormal and stem cankers is below soil line and canker lesion color is brown then diagnosis is rhizoctonia root rot
25. Learning as search Inductive learning: find a concept description that fits the data Example: rule sets as description language Enormous, but finite, search space Simple solution: enumerate the concept space eliminate descriptions that do not fit examples surviving descriptions contain target concept witten&eibe
26. Enumerating the concept space Search space for weather problem 4 x 4 x 3 x 3 x 2 = 288 possible combinations With 14 rules ? 2.7x10 34 possible rule sets Solution: candidate-elimination algorithm Other practical problems: More than one description may survive No description may survive Language is unable to describe target concept or data contains noise witten&eibe
27. The version space Space of consistent concept descriptions Completely determined by two sets L : most specific descriptions that cover all positive examples and no negative ones G : most general descriptions that do not cover any negative examples and all positive ones Only L and G need be maintained and updated But: still computationally very expensive And: does not solve other practical problems witten&eibe
28. *Version space example, 1 Given: red or green cows or chicken witten&eibe Start with: L ={} G ={<*, *>} First example: <green,cow>: positive How does this change L and G?
29. *Version space example, 2 Given: red or green cows or chicken witten&eibe Result: L ={<green, cow>} G ={<*, *>} Second example: <red,chicken>: negative
30. *Version space example, 3 Given: red or green cows or chicken witten&eibe Result: L ={<green, cow>} G ={<green,*>,<*,cow>} Final example: <green, chicken>: positive
31. *Version space example, 4 Given: red or green cows or chicken witten&eibe Resultant version space: L ={<green, *>} G ={<green, *>}
32. *Version space example, 5 Given: red or green cows or chicken witten&eibe L ={} G ={<*, *>} <green,cow>: positive L ={<green, cow>} G ={<*, *>} <red,chicken>: negative L ={<green, cow>} G ={<green,*>,<*,cow>} <green, chicken>: positive L ={<green, *>} G ={<green, *>}
33. *Candidate-elimination algorithm witten&eibe Initialize L and G For each example e: If e is positive: Delete all elements from G that do not cover e For each element r in L that does not cover e: Replace r by all of its most specific generalizations that 1. cover e and 2. are more specific than some element in G Remove elements from L that are more general than some other element in L If e is negative: Delete all elements from L that cover e For each element r in G that covers e: Replace r by all of its most general specializations that 1. do not cover e and 2. are more general than some element in L Remove elements from G that are more specific than some other element in G
35. Bias Important decisions in learning systems: Concept description language Order in which the space is searched Way that overfitting to the particular training data is avoided These form the ¡°bias¡± of the search: Language bias Search bias Overfitting-avoidance bias witten&eibe
36. Language bias Important question: is language universal or does it restrict what can be learned? Universal language can express arbitrary subsets of examples If language includes logical or (¡°disjunction¡±), it is universal Example: rule sets Domain knowledge can be used to exclude some concept descriptions a priori from the search witten&eibe
37. Search bias Search heuristic ¡° Greedy¡± search: performing the best single step ¡° Beam search¡±: keeping several alternatives ¡ Direction of search General-to-specific E.g. specializing a rule by adding conditions Specific-to-general E.g. generalizing an individual instance into a rule witten&eibe
38. Overfitting-avoidance bias Can be seen as a form of search bias Modified evaluation criterion E.g. balancing simplicity and number of errors Modified search strategy E.g. pruning (simplifying a description) Pre-pruning: stops at a simple description before search proceeds to an overly complex one Post-pruning: generates a complex description first and simplifies it afterwards witten&eibe