ML Zoomcamp 3 - Machine Learning for Classification

Sep 17, 20211 like1,048 views

Alexey Grigorev

The document discusses the CRISP-DM methodology for machine learning projects. It describes the six main steps: 1) business understanding to define goals, 2) data understanding to analyze sources, 3) data preparation to transform data, 4) modeling to select the best model, 5) evaluation to validate goals are met, and 6) deployment to production. These steps provide an iterative process from problem definition to model deployment.

ML Zoomcamp 1.3 - Supervised Machine LearningAlexey Grigorev

��

The document discusses supervised machine learning. It defines supervised machine learning and provides examples of regression, classification, and ranking problems. It also includes examples of datasets with features and target values for classification and regression problems. Machine learning algorithms are trained on these labeled examples to learn a function that maps new examples to output labels.

ML Zoomcamp 1.2 - ML vs Rule-Based SystemsAlexey Grigorev

��

The document compares rule-based systems and machine learning for spam detection. It begins by showing examples of spam emails and rules to identify them. However, more rules are needed as spammers adapt, making rule-based systems complex. The document then introduces machine learning, which involves extracting features from email data, training a model on many examples, and using the model to classify new emails. This approach can learn from data automatically and perform better than hand-crafted rules.

ML Zoomcamp 4 - Evaluation Metrics for ClassificationAlexey Grigorev

��

ML Zoomcamp 1.5 - Model Selection ProcessAlexey Grigorev

��

The document discusses the process of selecting the best machine learning model, including evaluating multiple models on a holdout dataset to determine the best performing one. It notes common model types like logistic regression, decision trees, and neural networks that could be considered. It also addresses the need to split data into training, validation, and test sets to accurately assess model performance and avoid overfitting conclusions from multiple comparisons of models during selection.

ML Zoomcamp 6 - Decision Trees and Ensemble LearningAlexey Grigorev

��

ML Zoomcamp 8 - Neural networks and deep learningAlexey Grigorev

��

ML Zoomcamp - Course Overview and LogisticsAlexey Grigorev

��

This document outlines the course plan for a Machine Learning zoom camp hosted by DataTalks.Club. The 11-session course will cover topics ranging from introductory machine learning concepts to advanced techniques like deep learning, model deployment, and Kubernetes. Participants should have some Python experience and be comfortable with command line. Completing homework assignments and projects can earn participants up to 100 points to be listed on a public leaderboard. The goal is to help attendees learn applied machine learning skills in a collaborative, public setting.

ML Zoomcamp 2.1 - Car Price Prediction ProjectAlexey Grigorev

��

This document outlines the plan for a car price prediction project. The plan includes preparing and exploring the data, using linear regression to predict prices, understanding how linear regression works, evaluating the model's accuracy, engineering features, regularizing the model, and implementing the model. The source code for the project is available at a GitHub link provided. The next step mentioned is exploratory data analysis of the data.

ML Zoomcamp 2 - �ݺ�ߣsAlexey Grigorev

��

MLOps week 1 introAlexey Grigorev

��

MLOps at OLXAlexey Grigorev

��

This document discusses MLOps at OLX, including: - The main areas of data science work at OLX like search, recommendations, fraud detection, and content moderation. - How OLX uses teams structured by both feature areas and roles to collaborate on projects. - A maturity model for MLOps with levels from no MLOps to fully automated processes. - How OLX has improved from siloed work to cross-functional teams and adding more automation to model creation, release, and application integration over time.

State Of GPTssuser6f266e

��

Andre Carpathy, a founding member of OpenAI, explains in "State of GPT" the process of training GPT, an emerging ecosystem of large language models. It starts with pre-training with large datasets that generate the base model through tokenization and translation. Andre also explains that the power of Llama, a smaller model, is more powerful than GPT3 despite containing fewer parameters. The speaker discusses the training of Transformer models for language modeling, followed by the evolution of base models that have arisen since GPT-2. The training process consists of pre-training, supervised fine-tuning, reward modeling, and reinforcement learning. The speaker also talks about improving the performance of Transformers by prompting them, using self-consistency, and prompt engineering. Finally, the speaker addresses the limitations of LLMs, including biases and reasoning errors, and suggests using them in low-stakes applications with human oversight.

TTF.52Arcee327

��

Fivetran pitch deckTech in Asia

��

TTF.SP.30.TTArcee327

��

Zagor sd 038 izgubljeni sin (big guy,strider&unregistred & emeri)(5...zoran radovic

��

TTF.OG.29Arcee327

��

ZS - 0166 - Teks Viler - COVEK IZ KENTAKIJAStripovizijacom

��

PM Interview Evaluation Sheet: Product Design QuestionLewis Lin 🦊

��

Mynt Pitch DeckXavierRoss5

��

Mynt aims to disrupt the paint industry through a tech-driven direct-to-consumer and business-to-business model for sustainable paint products. They plan to reinvent paint sales through digital tools, natural ingredients, and a seamless online customer experience to address issues with conventional paints like toxicity, carbon footprint, and plastic waste. Mynt has developed an innovative digital color matching and mixing solution called Myx+Match and sees opportunities in trends toward sustainable consumption and digital adoption in the paint and home goods market.

Fine tuning large LMsSylvainGugger

��

This document discusses techniques for fine-tuning large pre-trained language models without access to a supercomputer. It describes the history of transformer models and how transfer learning works. It then outlines several techniques for reducing memory usage during fine-tuning, including reducing batch size, gradient accumulation, gradient checkpointing, mixed precision training, and distributed data parallelism approaches like ZeRO and pipelined parallelism. Resources for implementing these techniques are also provided.

Zagor Ludens 215 - Sjene u šumiStripovizijacom

��

ML Zoomcamp 10 - KubernetesAlexey Grigorev

��

Dropbox's original pitch deckPitch Decks

��

Dropbox provides secure file sharing, collaboration, and storage solutions. As of December 31, 2020, the company had approximately 700 million registered users across 180 countries. The company was formerly known as Evenflow, Inc. and changed its name to Dropbox, Inc. in October 2009. Dropbox made its debut on the startup scene at the Y Combinator Demo Day in the summer of 2007.Sequoia Capital shared Dropbox’s original pitch deck, which helped the company raise its $1.2M seed round that year. In just 17 slides, the presentation makes a compelling case for their freemium business model and viral acquisition strategy. See more: bestpitchdeck.com/dropbox bestpitch.es/dropbox-pitch-deck-template

Ada-Health-Pitch-DeckGeorgeNelson33

��

Lms 340 - veliki blek - blek u londonuStripovizijacom

��

Codementor - Data Science at OLX Alexey Grigorev

��

This document summarizes a presentation about data science at OLX. It discusses OLX's moderation and recommender systems. For moderation, it describes OLX's machine learning models that automatically moderate listings for issues like duplicates, spam, and illegal/NSFW content. Moderators review flagged content. For recommendations, it discusses collaborative filtering and item embeddings to suggest relevant listings to users. It also outlines OLX's team structure, goal setting process, and expectations for data scientists, which include a focus on modeling, evaluation and some production work.

Data Monitoring with whylogsAlexey Grigorev

��

Whylogs is an open source tool for data monitoring that automatically creates statistical summaries called profiles of datasets. It helps with data monitoring by generating these profiles which can be compared over time to detect changes visually or programmatically. This allows issues like schema changes or bugs in data pipelines to be identified. The profiles have properties like being descriptive, lightweight and mergeable, which enables monitoring across distributed systems by allowing profile data to be logically merged. Whylogs thus provides a step towards observability of data systems.

More Related Content

What's hot (20)

ML Zoomcamp - Course Overview and LogisticsAlexey Grigorev

��

ML Zoomcamp 2.1 - Car Price Prediction ProjectAlexey Grigorev

��

ML Zoomcamp 2 - �ݺ�ߣsAlexey Grigorev

��

MLOps week 1 introAlexey Grigorev

��

MLOps at OLXAlexey Grigorev

��

State Of GPTssuser6f266e

��

TTF.52Arcee327

��

Fivetran pitch deckTech in Asia

��

TTF.SP.30.TTArcee327

��

Zagor sd 038 izgubljeni sin (big guy,strider&unregistred & emeri)(5...zoran radovic

��

TTF.OG.29Arcee327

��

ZS - 0166 - Teks Viler - COVEK IZ KENTAKIJAStripovizijacom

��

PM Interview Evaluation Sheet: Product Design QuestionLewis Lin 🦊

��

Mynt Pitch DeckXavierRoss5

��

Fine tuning large LMsSylvainGugger

��

Zagor Ludens 215 - Sjene u šumiStripovizijacom

��

ML Zoomcamp 10 - KubernetesAlexey Grigorev

��

Dropbox's original pitch deckPitch Decks

��

Ada-Health-Pitch-DeckGeorgeNelson33

��

Lms 340 - veliki blek - blek u londonuStripovizijacom

��

ML Zoomcamp - Course Overview and LogisticsAlexey Grigorev

��

ML Zoomcamp 2.1 - Car Price Prediction ProjectAlexey Grigorev

��

ML Zoomcamp 2 - �ݺ�ߣsAlexey Grigorev

��

MLOps week 1 introAlexey Grigorev

��

MLOps at OLXAlexey Grigorev

��

State Of GPTssuser6f266e

��

TTF.52Arcee327

��

Fivetran pitch deckTech in Asia

��

TTF.SP.30.TTArcee327

��

Zagor sd 038 izgubljeni sin (big guy,strider&unregistred & emeri)(5...zoran radovic

��

TTF.OG.29Arcee327

��

ZS - 0166 - Teks Viler - COVEK IZ KENTAKIJAStripovizijacom

��

PM Interview Evaluation Sheet: Product Design QuestionLewis Lin 🦊

��

Mynt Pitch DeckXavierRoss5

��

Fine tuning large LMsSylvainGugger

��

Zagor Ludens 215 - Sjene u šumiStripovizijacom

��

ML Zoomcamp 10 - KubernetesAlexey Grigorev

��

Dropbox's original pitch deckPitch Decks

��

Ada-Health-Pitch-DeckGeorgeNelson33

��

Lms 340 - veliki blek - blek u londonuStripovizijacom

��

More from Alexey Grigorev (16)

Codementor - Data Science at OLX Alexey Grigorev

��

Data Monitoring with whylogsAlexey Grigorev

��

Data engineering zoomcamp introductionAlexey Grigorev

��

The document outlines the plan and syllabus for a Data Engineering Zoomcamp hosted by DataTalks.Club. It introduces the four instructors for the course - Ankush Khanna, Sejal Vaidya, Victoria Perez Mola, and Alexey Grigorev. The 10-week course will cover topics like data ingestion, data warehousing with BigQuery, analytics engineering with dbt, batch processing with Spark, streaming with Kafka, and a culminating 3-week student project. Pre-requisites include experience with Python, SQL, and the command line. Course materials will be pre-recorded videos and there will be weekly live office hours for support. Students can earn a certificate and compete on a

AI in Fashion - Size & Fit - Nour KaressliAlexey Grigorev

��

This document discusses Zalando's use of AI to improve size and fit recommendations for customers. It outlines several challenges including varying size conventions, limited fit data for new items, and sparse customer purchase histories. It then describes Zalando's approaches to address these, including algorithms that use item images to predict sizes for new items lacking data (SizeNet) and models that learn from customers' past purchases and feedback to provide personalized size recommendations. The goal is to help customers find the right fit on their first purchase to reduce returns and improve the shopping experience.

AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAlexey Grigorev

��

Computer vision techniques like facial recognition and image captioning can help automate metadata generation for media companies. Facial recognition can identify people in photos to assist editors and improve searchability, while image captioning can propose captions. A case study of applying these techniques to photos from English Premier League football games achieved 99% accuracy for facial recognition and precision of 78.7% for image captioning. Combining the two allows generating customized captions that include names identified through facial recognition. Challenges remain when the automatic caption does not match details in the image.

Paradoxes in Data ScienceAlexey Grigorev

��

This document discusses several paradoxes that can arise in data science. It begins by discussing modelling and simulations that can be used when data is unavailable. It then outlines Simpson's Paradox, where a trend seen in groups disappears or reverses when the groups are combined. Next, it discusses the accuracy paradox, where a metric stops being useful once it becomes the target. It also discusses the learnability-Godel paradox related to the limitations of mathematics according to Godel's incompleteness theorems. Finally, it discusses the law of unintended consequences as it relates to data science.

Algorithmic fairnessAlexey Grigorev

��

An algorithm is considered fair if its results and performance are independent of sensitive variables like gender, ethnicity, etc. Fairness can be introduced at different stages of model development, such as in data collection, preparation, and model selection. Techniques for identifying and mitigating bias include causal reasoning, explainability, fairness metrics, and counterfactuals. Counterfactual fairness evaluates predictions across different protected attribute values while holding other variables constant. Explainability helps ensure models make decisions for the right reasons. Overall fairness aims to achieve equal outcomes or opportunities across groups.

Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev

��

Olga Petrova gives an introduction to transformers for natural language processing (NLP). She begins with an overview of representing words using tokenization, word embeddings, and one-hot encodings. Recurrent neural networks (RNNs) are discussed as they are important for modeling sequential data like text, but they struggle with long-term dependencies. Attention mechanisms were developed to address this by allowing the model to focus on relevant parts of the input. Transformers use self-attention and have achieved state-of-the-art results in many NLP tasks. Bidirectional Encoder Representations from Transformers (BERT) provides contextualized word embeddings trained on large corpora.

ML Zoomcamp Week #2 Office HoursAlexey Grigorev

��

AMLD2021 - ML in online marketplacesAlexey Grigorev

��

This document discusses the use of machine learning in online marketplaces. It outlines how machine learning is used for recommendations, search, trust and safety, seller experience, and pricing/monetization. Specific applications mentioned include collaborative and content-based recommendation systems, ranking models for search, automated content moderation, image quality assessment, dynamic pricing, and promoting listings. The document provides examples of algorithms like counting, collaborative filtering, learning to rank, and neural networks that power these machine learning applications in online marketplaces.

ML Zoomcamp 1.10 - SummaryAlexey Grigorev

��

This document contains summaries from multiple sessions of a machine learning zoomcamp. It introduces machine learning concepts like supervised learning, the CRISP-DM process, model selection, linear algebra, and the Python libraries NumPy and Pandas. It also discusses setting up an environment for machine learning and provides example data and models for tasks like email spam detection and car price prediction.

ML Zoomcamp 1.8 - Linear Algebra RefresherAlexey Grigorev

��

ML Zoomcamp 1.1 - Introduction to Machine LearningAlexey Grigorev

��

This document introduces machine learning and how it can be used to predict car prices based on characteristics like year, make, mileage, and other available data. It explains that an expert can use this data to determine a car's price, and a machine learning model can be trained to do the same by learning patterns in the data. The model would be trained on sample data that contains the features known about each car along with the target price value. It could then be used to predict prices for new cars by taking in their features. This allows applying the patterns learned during training to make useful predictions without needing expert knowledge.

Machine Learning in Online MarketplacesAlexey Grigorev

��

The document discusses machine learning use cases in online marketplaces. It outlines how ML can be used for search, recommendations, trust and safety, seller experience, and pricing/monetization. Specific applications include recommending similar products, learning user preferences for search rankings, detecting illegal/unsafe content, assessing listing quality, and determining optimal pricing. The document provides examples of algorithms like collaborative filtering, neural networks, and learning to rank that power these ML systems in marketplaces.

From Software Engineering To Machine LearningAlexey Grigorev

��

This document provides guidance on transitioning from a software engineering background to machine learning. It recommends learning fundamentals like Python, NumPy, and Pandas first before more complex algorithms. The best way to learn is through hands-on projects, starting with simple algorithms and evaluating models. Deploying models is described as easy for engineers but difficult for data scientists. Community involvement is encouraged to avoid working alone. Real-world projects are presented from domains like car pricing, customer churn, credit risk, and image classification to illustrate learning concepts.

3 hacks to accelerate your data science career Alexey Grigorev

��

This document provides 3 hacks to accelerate a data science career: 1. Be friends with your product manager to learn about prioritization, communication, planning, users, and marketing. 2. Be visible by giving demos of your work, creating frontends for models, speaking at internal events, and writing blog posts. 3. Become part of the data science community by participating in meetups and conferences.

Codementor - Data Science at OLX Alexey Grigorev

��

Data Monitoring with whylogsAlexey Grigorev

��

Data engineering zoomcamp introductionAlexey Grigorev

��

AI in Fashion - Size & Fit - Nour KaressliAlexey Grigorev

��

AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAlexey Grigorev

��

Paradoxes in Data ScienceAlexey Grigorev

��

Algorithmic fairnessAlexey Grigorev

��

Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev

��

ML Zoomcamp Week #2 Office HoursAlexey Grigorev

��

AMLD2021 - ML in online marketplacesAlexey Grigorev

��

ML Zoomcamp 1.10 - SummaryAlexey Grigorev

��

ML Zoomcamp 1.8 - Linear Algebra RefresherAlexey Grigorev

��

ML Zoomcamp 1.1 - Introduction to Machine LearningAlexey Grigorev

��

Machine Learning in Online MarketplacesAlexey Grigorev

��

From Software Engineering To Machine LearningAlexey Grigorev

��

3 hacks to accelerate your data science career Alexey Grigorev

��

Recently uploaded (20)

$GRADE-1-QUARTER 4-MATHEMATICS-WEEK-3.pptx$ $GRADE-1-QUARTER 4-MATHEMATICS-WEEK-3.pptx$

GRADE-1-QUARTER 4-MATHEMATICS-WEEK-3.pptxAngellieMaeDoce

��

Cyrus_Kelisha_SMM_PB1_2024-November.pptxKelishaCyrus

��

How to Configure Deliver Content by Email in Odoo 18 SalesCeline George

��

CRITICAL THINKING AND NURSING JUDGEMENT.pptxPoojaSen20

��

Research & Research Methods: Basic Concepts and Types.pptxDr. Sarita Anand

��

Chapter 2. Strategic Management: Corporate Governance.pdfRommel Regala

��

How to Configure Proforma Invoice in Odoo 18 SalesCeline George

��

ASP.NET Web API Interview Questions By ScholarhatScholarhat

��

lklklklklklklklklklklklklklklklklklklklklklklklklklklklkpreetheshparmar

��

OOPs Interview Questions PDF By ScholarHatScholarhat

��

Meeting the needs of modern students?, Selina McCoyEconomic and Social Research Institute

��

Annex-C-1_COT-Rubric-for-Proficient-Teacher-_1_.pptjoan dalilis

��

Dot NET Core Interview Questions PDF By ScholarHatScholarhat

��

Chapter 1. Basic Concepts of Strategic Management.pdfRommel Regala

��

How to Setup WhatsApp in Odoo 17 - Odoo �ݺ�ߣsCeline George

��

BISNIS BERKAH BERANGKAT KE MEKKAH ISTIKMAL SYARIAHcoacharyasetiyaki

��

Managing expiration dates of products in odooCeline George

��

Azure Data Engineer Interview Questions By ScholarHatScholarhat

��

The Constitution, Government and Law making bodies .saanidhyapatel09

��

DBMS Interview Questions PDF By ScholarHatScholarhat

��

$GRADE-1-QUARTER 4-MATHEMATICS-WEEK-3.pptx$ $GRADE-1-QUARTER 4-MATHEMATICS-WEEK-3.pptx$

GRADE-1-QUARTER 4-MATHEMATICS-WEEK-3.pptxAngellieMaeDoce

��

Cyrus_Kelisha_SMM_PB1_2024-November.pptxKelishaCyrus

��

How to Configure Deliver Content by Email in Odoo 18 SalesCeline George

��

CRITICAL THINKING AND NURSING JUDGEMENT.pptxPoojaSen20

��

Research & Research Methods: Basic Concepts and Types.pptxDr. Sarita Anand

��

Chapter 2. Strategic Management: Corporate Governance.pdfRommel Regala

��

How to Configure Proforma Invoice in Odoo 18 SalesCeline George

��

ASP.NET Web API Interview Questions By ScholarhatScholarhat

��

lklklklklklklklklklklklklklklklklklklklklklklklklklklklkpreetheshparmar

��

OOPs Interview Questions PDF By ScholarHatScholarhat

��

Meeting the needs of modern students?, Selina McCoyEconomic and Social Research Institute

��

Annex-C-1_COT-Rubric-for-Proficient-Teacher-_1_.pptjoan dalilis

��

Dot NET Core Interview Questions PDF By ScholarHatScholarhat

��

Chapter 1. Basic Concepts of Strategic Management.pdfRommel Regala

��

How to Setup WhatsApp in Odoo 17 - Odoo �ݺ�ߣsCeline George

��

BISNIS BERKAH BERANGKAT KE MEKKAH ISTIKMAL SYARIAHcoacharyasetiyaki

��

Managing expiration dates of products in odooCeline George

��

Azure Data Engineer Interview Questions By ScholarHatScholarhat

��

The Constitution, Government and Law making bodies .saanidhyapatel09

��

DBMS Interview Questions PDF By ScholarHatScholarhat

��

�ݺ�ߣ