The document discusses the CRISP-DM methodology for machine learning projects. It describes the six main steps: 1) business understanding to define goals, 2) data understanding to analyze sources, 3) data preparation to transform data, 4) modeling to select the best model, 5) evaluation to validate goals are met, and 6) deployment to production. These steps provide an iterative process from problem definition to model deployment.
ML Zoomcamp 1.3 - Supervised Machine LearningAlexey Grigorev
油
The document discusses supervised machine learning. It defines supervised machine learning and provides examples of regression, classification, and ranking problems. It also includes examples of datasets with features and target values for classification and regression problems. Machine learning algorithms are trained on these labeled examples to learn a function that maps new examples to output labels.
The document compares rule-based systems and machine learning for spam detection. It begins by showing examples of spam emails and rules to identify them. However, more rules are needed as spammers adapt, making rule-based systems complex. The document then introduces machine learning, which involves extracting features from email data, training a model on many examples, and using the model to classify new emails. This approach can learn from data automatically and perform better than hand-crafted rules.
The document discusses the process of selecting the best machine learning model, including evaluating multiple models on a holdout dataset to determine the best performing one. It notes common model types like logistic regression, decision trees, and neural networks that could be considered. It also addresses the need to split data into training, validation, and test sets to accurately assess model performance and avoid overfitting conclusions from multiple comparisons of models during selection.
This document outlines the course plan for a Machine Learning zoom camp hosted by DataTalks.Club. The 11-session course will cover topics ranging from introductory machine learning concepts to advanced techniques like deep learning, model deployment, and Kubernetes. Participants should have some Python experience and be comfortable with command line. Completing homework assignments and projects can earn participants up to 100 points to be listed on a public leaderboard. The goal is to help attendees learn applied machine learning skills in a collaborative, public setting.
ML Zoomcamp 2.1 - Car Price Prediction ProjectAlexey Grigorev
油
This document outlines the plan for a car price prediction project. The plan includes preparing and exploring the data, using linear regression to predict prices, understanding how linear regression works, evaluating the model's accuracy, engineering features, regularizing the model, and implementing the model. The source code for the project is available at a GitHub link provided. The next step mentioned is exploratory data analysis of the data.
This document discusses MLOps at OLX, including:
- The main areas of data science work at OLX like search, recommendations, fraud detection, and content moderation.
- How OLX uses teams structured by both feature areas and roles to collaborate on projects.
- A maturity model for MLOps with levels from no MLOps to fully automated processes.
- How OLX has improved from siloed work to cross-functional teams and adding more automation to model creation, release, and application integration over time.
Andre Carpathy, a founding member of OpenAI, explains in "State of GPT" the process of training GPT, an emerging ecosystem of large language models. It starts with pre-training with large datasets that generate the base model through tokenization and translation. Andre also explains that the power of Llama, a smaller model, is more powerful than GPT3 despite containing fewer parameters. The speaker discusses the training of Transformer models for language modeling, followed by the evolution of base models that have arisen since GPT-2. The training process consists of pre-training, supervised fine-tuning, reward modeling, and reinforcement learning. The speaker also talks about improving the performance of Transformers by prompting them, using self-consistency, and prompt engineering. Finally, the speaker addresses the limitations of LLMs, including biases and reasoning errors, and suggests using them in low-stakes applications with human oversight.
Mynt aims to disrupt the paint industry through a tech-driven direct-to-consumer and business-to-business model for sustainable paint products. They plan to reinvent paint sales through digital tools, natural ingredients, and a seamless online customer experience to address issues with conventional paints like toxicity, carbon footprint, and plastic waste. Mynt has developed an innovative digital color matching and mixing solution called Myx+Match and sees opportunities in trends toward sustainable consumption and digital adoption in the paint and home goods market.
This document discusses techniques for fine-tuning large pre-trained language models without access to a supercomputer. It describes the history of transformer models and how transfer learning works. It then outlines several techniques for reducing memory usage during fine-tuning, including reducing batch size, gradient accumulation, gradient checkpointing, mixed precision training, and distributed data parallelism approaches like ZeRO and pipelined parallelism. Resources for implementing these techniques are also provided.
Dropbox provides secure file sharing, collaboration, and storage solutions. As of December 31, 2020, the company had approximately 700 million registered users across 180 countries. The company was formerly known as Evenflow, Inc. and changed its name to Dropbox, Inc. in October 2009.
Dropbox made its debut on the startup scene at the Y Combinator Demo Day in the summer of 2007.Sequoia Capital shared Dropboxs original pitch deck, which helped the company raise its $1.2M seed round that year. In just 17 slides, the presentation makes a compelling case for their freemium business model and viral acquisition strategy.
See more: bestpitchdeck.com/dropbox
bestpitch.es/dropbox-pitch-deck-template
This document summarizes a presentation about data science at OLX. It discusses OLX's moderation and recommender systems. For moderation, it describes OLX's machine learning models that automatically moderate listings for issues like duplicates, spam, and illegal/NSFW content. Moderators review flagged content. For recommendations, it discusses collaborative filtering and item embeddings to suggest relevant listings to users. It also outlines OLX's team structure, goal setting process, and expectations for data scientists, which include a focus on modeling, evaluation and some production work.
Whylogs is an open source tool for data monitoring that automatically creates statistical summaries called profiles of datasets. It helps with data monitoring by generating these profiles which can be compared over time to detect changes visually or programmatically. This allows issues like schema changes or bugs in data pipelines to be identified. The profiles have properties like being descriptive, lightweight and mergeable, which enables monitoring across distributed systems by allowing profile data to be logically merged. Whylogs thus provides a step towards observability of data systems.
This document outlines the course plan for a Machine Learning zoom camp hosted by DataTalks.Club. The 11-session course will cover topics ranging from introductory machine learning concepts to advanced techniques like deep learning, model deployment, and Kubernetes. Participants should have some Python experience and be comfortable with command line. Completing homework assignments and projects can earn participants up to 100 points to be listed on a public leaderboard. The goal is to help attendees learn applied machine learning skills in a collaborative, public setting.
ML Zoomcamp 2.1 - Car Price Prediction ProjectAlexey Grigorev
油
This document outlines the plan for a car price prediction project. The plan includes preparing and exploring the data, using linear regression to predict prices, understanding how linear regression works, evaluating the model's accuracy, engineering features, regularizing the model, and implementing the model. The source code for the project is available at a GitHub link provided. The next step mentioned is exploratory data analysis of the data.
This document discusses MLOps at OLX, including:
- The main areas of data science work at OLX like search, recommendations, fraud detection, and content moderation.
- How OLX uses teams structured by both feature areas and roles to collaborate on projects.
- A maturity model for MLOps with levels from no MLOps to fully automated processes.
- How OLX has improved from siloed work to cross-functional teams and adding more automation to model creation, release, and application integration over time.
Andre Carpathy, a founding member of OpenAI, explains in "State of GPT" the process of training GPT, an emerging ecosystem of large language models. It starts with pre-training with large datasets that generate the base model through tokenization and translation. Andre also explains that the power of Llama, a smaller model, is more powerful than GPT3 despite containing fewer parameters. The speaker discusses the training of Transformer models for language modeling, followed by the evolution of base models that have arisen since GPT-2. The training process consists of pre-training, supervised fine-tuning, reward modeling, and reinforcement learning. The speaker also talks about improving the performance of Transformers by prompting them, using self-consistency, and prompt engineering. Finally, the speaker addresses the limitations of LLMs, including biases and reasoning errors, and suggests using them in low-stakes applications with human oversight.
Mynt aims to disrupt the paint industry through a tech-driven direct-to-consumer and business-to-business model for sustainable paint products. They plan to reinvent paint sales through digital tools, natural ingredients, and a seamless online customer experience to address issues with conventional paints like toxicity, carbon footprint, and plastic waste. Mynt has developed an innovative digital color matching and mixing solution called Myx+Match and sees opportunities in trends toward sustainable consumption and digital adoption in the paint and home goods market.
This document discusses techniques for fine-tuning large pre-trained language models without access to a supercomputer. It describes the history of transformer models and how transfer learning works. It then outlines several techniques for reducing memory usage during fine-tuning, including reducing batch size, gradient accumulation, gradient checkpointing, mixed precision training, and distributed data parallelism approaches like ZeRO and pipelined parallelism. Resources for implementing these techniques are also provided.
Dropbox provides secure file sharing, collaboration, and storage solutions. As of December 31, 2020, the company had approximately 700 million registered users across 180 countries. The company was formerly known as Evenflow, Inc. and changed its name to Dropbox, Inc. in October 2009.
Dropbox made its debut on the startup scene at the Y Combinator Demo Day in the summer of 2007.Sequoia Capital shared Dropboxs original pitch deck, which helped the company raise its $1.2M seed round that year. In just 17 slides, the presentation makes a compelling case for their freemium business model and viral acquisition strategy.
See more: bestpitchdeck.com/dropbox
bestpitch.es/dropbox-pitch-deck-template
This document summarizes a presentation about data science at OLX. It discusses OLX's moderation and recommender systems. For moderation, it describes OLX's machine learning models that automatically moderate listings for issues like duplicates, spam, and illegal/NSFW content. Moderators review flagged content. For recommendations, it discusses collaborative filtering and item embeddings to suggest relevant listings to users. It also outlines OLX's team structure, goal setting process, and expectations for data scientists, which include a focus on modeling, evaluation and some production work.
Whylogs is an open source tool for data monitoring that automatically creates statistical summaries called profiles of datasets. It helps with data monitoring by generating these profiles which can be compared over time to detect changes visually or programmatically. This allows issues like schema changes or bugs in data pipelines to be identified. The profiles have properties like being descriptive, lightweight and mergeable, which enables monitoring across distributed systems by allowing profile data to be logically merged. Whylogs thus provides a step towards observability of data systems.
The document outlines the plan and syllabus for a Data Engineering Zoomcamp hosted by DataTalks.Club. It introduces the four instructors for the course - Ankush Khanna, Sejal Vaidya, Victoria Perez Mola, and Alexey Grigorev. The 10-week course will cover topics like data ingestion, data warehousing with BigQuery, analytics engineering with dbt, batch processing with Spark, streaming with Kafka, and a culminating 3-week student project. Pre-requisites include experience with Python, SQL, and the command line. Course materials will be pre-recorded videos and there will be weekly live office hours for support. Students can earn a certificate and compete on a
This document discusses Zalando's use of AI to improve size and fit recommendations for customers. It outlines several challenges including varying size conventions, limited fit data for new items, and sparse customer purchase histories. It then describes Zalando's approaches to address these, including algorithms that use item images to predict sizes for new items lacking data (SizeNet) and models that learn from customers' past purchases and feedback to provide personalized size recommendations. The goal is to help customers find the right fit on their first purchase to reduce returns and improve the shopping experience.
AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAlexey Grigorev
油
Computer vision techniques like facial recognition and image captioning can help automate metadata generation for media companies. Facial recognition can identify people in photos to assist editors and improve searchability, while image captioning can propose captions. A case study of applying these techniques to photos from English Premier League football games achieved 99% accuracy for facial recognition and precision of 78.7% for image captioning. Combining the two allows generating customized captions that include names identified through facial recognition. Challenges remain when the automatic caption does not match details in the image.
This document discusses several paradoxes that can arise in data science. It begins by discussing modelling and simulations that can be used when data is unavailable. It then outlines Simpson's Paradox, where a trend seen in groups disappears or reverses when the groups are combined. Next, it discusses the accuracy paradox, where a metric stops being useful once it becomes the target. It also discusses the learnability-Godel paradox related to the limitations of mathematics according to Godel's incompleteness theorems. Finally, it discusses the law of unintended consequences as it relates to data science.
An algorithm is considered fair if its results and performance are independent of sensitive variables like gender, ethnicity, etc. Fairness can be introduced at different stages of model development, such as in data collection, preparation, and model selection. Techniques for identifying and mitigating bias include causal reasoning, explainability, fairness metrics, and counterfactuals. Counterfactual fairness evaluates predictions across different protected attribute values while holding other variables constant. Explainability helps ensure models make decisions for the right reasons. Overall fairness aims to achieve equal outcomes or opportunities across groups.
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
油
Olga Petrova gives an introduction to transformers for natural language processing (NLP). She begins with an overview of representing words using tokenization, word embeddings, and one-hot encodings. Recurrent neural networks (RNNs) are discussed as they are important for modeling sequential data like text, but they struggle with long-term dependencies. Attention mechanisms were developed to address this by allowing the model to focus on relevant parts of the input. Transformers use self-attention and have achieved state-of-the-art results in many NLP tasks. Bidirectional Encoder Representations from Transformers (BERT) provides contextualized word embeddings trained on large corpora.
This document discusses the use of machine learning in online marketplaces. It outlines how machine learning is used for recommendations, search, trust and safety, seller experience, and pricing/monetization. Specific applications mentioned include collaborative and content-based recommendation systems, ranking models for search, automated content moderation, image quality assessment, dynamic pricing, and promoting listings. The document provides examples of algorithms like counting, collaborative filtering, learning to rank, and neural networks that power these machine learning applications in online marketplaces.
This document contains summaries from multiple sessions of a machine learning zoomcamp. It introduces machine learning concepts like supervised learning, the CRISP-DM process, model selection, linear algebra, and the Python libraries NumPy and Pandas. It also discusses setting up an environment for machine learning and provides example data and models for tasks like email spam detection and car price prediction.
The document outlines a session on vector and matrix operations, including vector operations, vector-vector multiplication, matrix-vector multiplication, matrix-matrix multiplication, and computing the matrix inverse. It provides examples of performing these operations on vectors and matrices.
ML Zoomcamp 1.1 - Introduction to Machine LearningAlexey Grigorev
油
This document introduces machine learning and how it can be used to predict car prices based on characteristics like year, make, mileage, and other available data. It explains that an expert can use this data to determine a car's price, and a machine learning model can be trained to do the same by learning patterns in the data. The model would be trained on sample data that contains the features known about each car along with the target price value. It could then be used to predict prices for new cars by taking in their features. This allows applying the patterns learned during training to make useful predictions without needing expert knowledge.
The document discusses machine learning use cases in online marketplaces. It outlines how ML can be used for search, recommendations, trust and safety, seller experience, and pricing/monetization. Specific applications include recommending similar products, learning user preferences for search rankings, detecting illegal/unsafe content, assessing listing quality, and determining optimal pricing. The document provides examples of algorithms like collaborative filtering, neural networks, and learning to rank that power these ML systems in marketplaces.
From Software Engineering To Machine LearningAlexey Grigorev
油
This document provides guidance on transitioning from a software engineering background to machine learning. It recommends learning fundamentals like Python, NumPy, and Pandas first before more complex algorithms. The best way to learn is through hands-on projects, starting with simple algorithms and evaluating models. Deploying models is described as easy for engineers but difficult for data scientists. Community involvement is encouraged to avoid working alone. Real-world projects are presented from domains like car pricing, customer churn, credit risk, and image classification to illustrate learning concepts.
3 hacks to accelerate your data science career Alexey Grigorev
油
This document provides 3 hacks to accelerate a data science career:
1. Be friends with your product manager to learn about prioritization, communication, planning, users, and marketing.
2. Be visible by giving demos of your work, creating frontends for models, speaking at internal events, and writing blog posts.
3. Become part of the data science community by participating in meetups and conferences.
How to Configure Deliver Content by Email in Odoo 18 SalesCeline George
油
In this slide, well discuss on how to configure proforma invoice in Odoo 18 Sales module. A proforma invoice is a preliminary invoice that serves as a commercial document issued by a seller to a buyer.
Research & Research Methods: Basic Concepts and Types.pptxDr. Sarita Anand
油
This ppt has been made for the students pursuing PG in social science and humanities like M.Ed., M.A. (Education), Ph.D. Scholars. It will be also beneficial for the teachers and other faculty members interested in research and teaching research concepts.
This course provides students with a comprehensive understanding of strategic management principles, frameworks, and applications in business. It explores strategic planning, environmental analysis, corporate governance, business ethics, and sustainability. The course integrates Sustainable Development Goals (SDGs) to enhance global and ethical perspectives in decision-making.
How to Configure Proforma Invoice in Odoo 18 SalesCeline George
油
In this slide, well discuss on how to configure proforma invoice in Odoo 18 Sales module. A proforma invoice is a preliminary invoice that serves as a commercial document issued by a seller to a buyer.
Chapter 1. Basic Concepts of Strategic Management.pdfRommel Regala
油
This course provides students with a comprehensive understanding of strategic management principles, frameworks, and applications in business. It explores strategic planning, environmental analysis, corporate governance, business ethics, and sustainability. The course integrates Sustainable Development Goals (SDGs) to enhance global and ethical perspectives in decision-making.
How to Setup WhatsApp in Odoo 17 - Odoo 際際滷sCeline George
油
Integrate WhatsApp into Odoo using the WhatsApp Business API or third-party modules to enhance communication. This integration enables automated messaging and customer interaction management within Odoo 17.
Managing expiration dates of products in odooCeline George
油
Odoo allows users to set expiration dates at both the product and batch levels, providing flexibility and accuracy. By using Odoo's expiration date management, companies can minimize waste, optimize stock rotation, and maintain high standards of product quality. The system allows users to set expiration dates at both the product and batch levels, providing flexibility and accuracy.
The Constitution, Government and Law making bodies .saanidhyapatel09
油
This PowerPoint presentation provides an insightful overview of the Constitution, covering its key principles, features, and significance. It explains the fundamental rights, duties, structure of government, and the importance of constitutional law in governance. Ideal for students, educators, and anyone interested in understanding the foundation of a nations legal framework.