The document describes LessSketchy, a machine learning tool that analyzes online posts to warn users about scams. It trains a balanced random forest classification model using a large dataset of posts scraped from the web. The model is trained to identify scams versus legitimate deals by balancing the dataset and bootstrapping trees from balanced samples to address the class imbalance between scams and real posts. The tool aims to help users avoid scams when searching for deals online.
12. Balanced Random Forest
Use it to train classification of scam vs. legit
Deals with an imbalance sample (1 : 99)
13. Balanced Random Forest
Use it to train classification of scam vs. legit
Deals with an imbalance sample (1 : 99)
Each tree is trained by bootstrapping a
balanced training sample
14. Balanced Random Forest
Use it to train classification of scam vs. legit
Deals with an imbalance sample (1 : 99)
Each tree is trained by bootstrapping a
balanced training sample
Aggregate the classification of each tree
15. About Shih-Ho Cheng
B.S. Physics / Math
Univ. of Virginia
Ph.D. Astro-Particle Physics
Penn State Univ.
Love board games!