Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes
For six years, my company has been focused on building commercial consumer-facing systems based on Linked data sources such as Freebase, DBpedia and Wikidata. I created :BaseKB, the first correct conversion of Freebase to RDF, which ensures that Freebase data will live on after Google shutters the service.
From this experience, we've methods for matching (i) syntax and schemas and (ii) instance data (specific things such as people, places, and legal entities) that use expressive business rules running inside a scalable fabric such as Spark or Hadoop to rapidly understand and clean up data from "data lakes" and other large collections. This technology also applies to communication...