16. Approximately 90% of the world’s data is held in unstructured formats
90%
Structured Numerical or Coded
Information
10%
Unstructured or Semi-structured
Information
Possibilities ..
By http://www.oracle.com, 2012
19. Data Extraction 자연어처리
Scraping a.k.a. Focused Crawling
HTML Tag, DOM Handling
Instance Extraction
긍부정 분석
Topic Modeling (LDA)
Word Counting 모호성 해소
문서분류
What we do
Semi-Structured
Un-Structured