Road to Winning at Horse Racing with Data ScienceShun Nukui
?
This document discusses developing an AI system for predicting horse race outcomes in order to make a profit. It summarizes the project's goals of defining an objective function for predictions, feature engineering using over 1,500 horse racing metrics, and training a LightGBM model on the data. Evaluation is done using nDCG to measure prediction accuracy against different scoring systems like horse placement, odds, and popularity. The goal is to predict horses that the public may miss in order to have higher returns.
Road to Winning at Horse Racing with Data ScienceShun Nukui
?
This document discusses developing an AI system for predicting horse race outcomes in order to make a profit. It summarizes the project's goals of defining an objective function for predictions, feature engineering using over 1,500 horse racing metrics, and training a LightGBM model on the data. Evaluation is done using nDCG to measure prediction accuracy against different scoring systems like horse placement, odds, and popularity. The goal is to predict horses that the public may miss in order to have higher returns.
Software Defect Prediction on Unlabeled DatasetsSung Kim
?
The document describes techniques for software defect prediction when labeled training data is unavailable. It proposes Transfer Defect Learning (TCA+) to improve cross-project defect prediction by normalizing data distributions between source and target projects. For projects with heterogeneous metrics, it introduces Heterogeneous Defect Prediction (HDP) which matches similar metrics between source and target to build cross-project prediction models. It also discusses CLAMI for defect prediction using only unlabeled data without human effort. The techniques are evaluated on open source projects to demonstrate their effectiveness compared to traditional cross-project and within-project prediction.
The document summarizes a dissertation defense about adaptive bug prediction by analyzing project history. It discusses the motivation for leveraging project history and software configuration management data for bug prediction. It also describes creating a corpus by identifying bug-fix changes and bug-introducing changes from commits. The dissertation proposes using a "bug cache" to predict likely locations of future bugs based on past bug occurrences.
Measuring the Quality of Online Service - Jinyoung kimJin Young Kim
?
This document discusses methods for measuring the quality of online services. It describes how major companies like Google, Facebook, and Netflix collect data through user behavior, panel surveys, and direct user feedback at different stages of their services. Panel surveys can provide insights but have limitations, while user behavior data is abundant but noisy. The document also provides examples of how to design panel surveys and side-by-side evaluations to assess search engine result pages. It concludes that the best approach is to combine various data collection methods depending on the service characteristics and lifecycle.
Liver cancer affects many people and their families. It has no known cure yet, but treatments like chemotherapy and surgery are used. The document discusses the history of cancer research and treatments over time. It also provides potential risk factors and ways to help prevent cancer, as well as organizations working toward a cure. The author has a personal connection, as their great grandmother passed away from liver cancer.
3rd Hour- Homelessness Around The World. By. Emily A. Scharich.(:yourpassport
?
This document discusses homelessness around the world. It begins with an introduction stating that homelessness is a global issue that affects children and can influence their views. It then provides an overview of some organizations that provide aid to the homeless, such as food and shelters. The document also examines the financial and long-term human impacts of homelessness increasing worldwide.
The United States has been engaged in the War in Afghanistan since 2001 following the 9/11 terrorist attacks. Over 1,196 U.S. troops have died and the war has cost $285 billion to date. The war has severely impacted both U.S. and Afghan families through loss of life and ongoing violence and instability in Afghanistan. Many argue the U.S. should begin withdrawing troops from Afghanistan, though others believe this could further destabilize the region. There are ongoing debates around how to best bring peace and security to Afghanistan.
Obesity in America has become a major problem, affecting people of all ages including children. It can cause serious health issues like heart disease, diabetes, and cancer. The document discusses obesity rates in different US states and globally, with Mississippi having the highest average BMI. Solutions proposed include eating healthier, exercising more, and improving processed foods. The presenter hopes to raise awareness and motivate lifestyle changes to address this important issue.
An extremely powerful 9.0 magnitude earthquake struck off the coast of Japan's Tohoku region on March 11, 2011, causing widespread damage and loss of life. It was the most powerful earthquake ever recorded in Japan and displaced Honshu eastward by 2.4 meters. Critical infrastructure was severely impacted, fires broke out at an oil refinery, and many residents were left homeless or orphaned in the aftermath of the devastating quake.
Child abuse occurs worldwide and takes many forms including physical, emotional, and sexual abuse as well as neglect. It has long-lasting negative effects on children's development and mental health. While many cases go unreported, addressing child abuse requires awareness of warning signs, ensuring children's basic needs are met, and creating a support system to protect them. Some jurisdictions have had success eliminating child abuse through community intervention and policy changes.
The document discusses the benefits of wind turbines as an energy source. It notes that wind turbines are an efficient way to generate power without relying on other countries and that the US government has provided $90 billion to install wind turbines across the country. The author believes wind turbines are a better energy solution and has a personal connection to them from seeing some near where they golf.
Poaching occurs around the world, mostly in Southeast Africa and other areas with high wildlife populations. Poaching involves illegally taking wild plants and animals. It is harmful because it can lead species to become endangered or extinct if people exceed legal limits. People poach for money due to greed or need, putting many species at risk. If poaching continues, we could lose many animal populations and disrupt natural predator-prey balances. More conservation officers are needed to curb heavy poaching in vulnerable areas.
Cyclones are the most destructive storms in the world. Typhoon Tip, which occurred in the Pacific in 1979, remains the largest and most intense tropical cyclone ever recorded. Cyclones form from clusters of strong thunderstorms and have an eye surrounded by an eyewall that is the calmest area with low pressure and no rain. They are most common in ocean regions like the Pacific, where around 15 occur each year, and can cause major economic damage, like Hurricane Katrina which cost nearly $1 billion.
DnA Playshop - Serious Fun with LEGO.pptxJin Young Kim
?
This document discusses LEGO facts and lessons that can be learned from LEGO as a platform. It notes that LEGO is a Danish company, Legoland is not owned by the LEGO group, and retired LEGO sets often increase significantly in value. It highlights LEGO's obsession with production quality, use of generic and versatile components, intuitive design and instructions, and vibrant fan community. The document also advertises a DnA Playshop event where participants will team up, build items from instructions, build something new together, have their creation voted on by others, and keep the LEGO bricks used.
Frontiers in Data Science For Modern Web Search EngineJin Young Kim
?
1) Modern web search engines use centralized data and analytics platforms to monitor key performance indicators and growth strategies across teams. This allows for internal alignment.
2) Search engines must adapt to dynamic environments through continuous search quality monitoring and experimentation to improve results.
3) Ensuring fairness in search results and recommendations helps create a healthy online ecosystem and promotes trustworthy content.
Subtleties in Tracking Happiness -- Seattle QS#10Jin Young Kim
?
This document summarizes Jin Young Kim's approach to tracking and measuring happiness over time. Some key points:
- Kim tracks happiness using a 5-point scale recorded 3 times per day, and also logs factors like sleep, events, and states that may influence happiness.
- Happiness is evaluated based on both successful achievement and sense of well-being. Metrics are analyzed to identify patterns and improve lifestyle.
- Past tracking revealed cyclical happiness and importance of structure like work/deadlines. Early mornings and avoiding home increased happiness.
- Lessons include the impact of self-rating and need to turn insights into tangible results, like maintaining an average happiness score.
- Tracking improved self
SIGIR Tutorial on IR Evaluation: Designing an End-to-End Offline Evaluation P...Jin Young Kim
?
This tutorial aims to provide attendees with a detailed understanding of end-to-end evaluation pipeline based on human judgments (offline measurement). The tutorial will give an overview of the state of the art methods, techniques, and metrics necessary for each stage of evaluation process. We will mostly focus on evaluating an information retrieval (search) system, but the other tasks such as recommendation and classification will also be discussed. Practical examples will be drawn both from the literature and from real world usage scenarios in industry.
The document discusses information retrieval (IR) research and introduces Jin Young Kim, a PhD student studying IR. Kim presents on designing retrieval models, recent trends in IR like personalized search and user modeling, and his own research projects in areas like structured document retrieval, personal search, and understanding book search behavior. The presentation aims to provide an overview of IR research and highlight some challenges and opportunities in the field.
24. ???? ????
? Academy vs. Industry (vs. Industry Lab)
? ???: ????? ??? ??? (until tenure)
? ???: ?? ? ???? ???
? ???: Best (or Worst) of Both?
? Korea vs. America
? ?? ?? ???? ??
? ????, 4????? ???? ?? ??
? Life vs. Career? ??
? e.g., SDE ??? ?? vs. ??? ????
24
25. Academic Job Search (in USA)
? ???? (???? ??? ??)
? ??? & ????
? ?? & ????
? Research & Teaching Statement
What Seems Necessary
? ????
? Impact?? ?? ? ?
? Resume Screening
? On-site Interview ? ??? Impressive? ???
? Job Talk is a critical part
? ???? ??? Network
? Compelling? Job Talk
? ?? ?? ??? ???
25
26. Industry Job Search (in USA)
? ????
? Industry-relevant Research (???)
? Industry Network (???)
? Technical Interview
Where to Apply For
? ????
? Job Title: SDE vs. Researcher
? Resume Screening
? On-site Interview ? ?? ?? & ???? & ?? ??
? Tech. Interview is critical
? Visa Sponsor ?? & ??
? Negotiation & Signing
? ?? & Compensation Package
26
27. Building Your Own Brand (esp. for Industry Career)
? Having a Career may not be Enough!
? ?? ? ???? ?? ??? ??
? Building Your Own Brand
? Blogging & SNS (e.g., Twitter and LinkedIn)
? Conference Participation (e.g., Tutorial)
? Hosting Workshops & Meetups
? Writing Books & Articles
? In 5-10 years, you wont need an employer!
Recommended Visit: http://thenoisychannel.com/ 27