際際滷

際際滷Share a Scribd company logo
DT Core Analytical Competencies
 Data Engineering
 Data Architecture Design and Development
 Large Scale Enterprise Architecture and Design
 Migrate, Extract, Transform, and Load Data
 Spatial, Multi-Domain, and Cloud Base Data Services

Analytics  Quantitative
 Data Transformation and Ingestion
 Dissemination and Reporting Tools
 Data Mining, Exploitation, and Correlation Tools
 Spatial Data Mining and Geographic Knowledge Discovery




                                 Data Tactics Corporation Proprietary and Confidential Material
DT Core Analytical Competencies
The Team:

Graduates of top tier universities to include
Stanford, Caltech and MIT as well as ties to these and
local universities.

Degrees include Mathematics, Computer
Science, Aeronautical
Engineering, Astrophysics, Electrical
Engineering, Mechanical Engineering, Statistics and
Social Sciences.

Competencies include data mining, machine
learning, statistics, spatial statistics, Bayesian
statistics, econometrics, computational geometry, spatial
econometrics, applied mathematics, theoretical
robotics, dynamic systems, control theory.

Foci include unsupervised cross-modal clustering
algorithms, principle component analysis, independent
component analysis, regression, spatial
regression, geographic weighted regression, zeroth order
processing, nonlinear optimization, autoregressive
models, time-series analysis, spatial regime models, HAC
models.

Technical Competencies include    Data Tactics Corporation Proprietary and Confidential Material
Data Tactics Analytics Cell




     Data Tactics Corporation Proprietary and Confidential Material
Analytics Competencies




                                          ZeroFill

                                                     40
 Time Series Analytics (i)                                                                                          (i)




                                                     0
    Applying the ARIMA model in a
                                                                                                           02-13

                                                                                                   Index


     parallelized environment to
     provide anomaly detection
 Correlation Analytics (ii)
    Brute force pairwise Pearsons
     correlation over vectors in a
     cloud-backed engine
 Aggregation Analytics (iii)
    Aggregate micro-pathing
      Repurposing data to analyze                                                                                 (ii)
      and display movement
      patterns
   Dwell time calculations
      Analytic to discover areas of
      interest based on movement
      activity
 Graph Analytics (iiii)
   Discovering social interaction
    models and paradigms within                                                                                           (iii)
    network data                                                                                  (iiii)   4
                                 Data Tactics Corporation Proprietary and Confidential Material
Analytics Competencies
 Directional Spatio-Temporal
 Analytics (i)                                                                                      (i)


   Compare distributions with a focus on
    changes in morphology of the
    distribution and mobility of individual
    observations within the distribution
    over that same period of time over
    space (Wy)
 Local Classification (ii)
    Non-self-similarities & self-similarities;                             (i)

     within and between group
     correlations.
 Ecological Analytics                                                                (ii)


   Regression Modeling
      Spatial Regression
      Spatial Regime Models
      HAC Models                                                                               5
                               Data Tactics Corporation Proprietary and Confidential Material
Data Tactics Data Repository




      Data Tactics Corporation Proprietary and Confidential Material
Quantitative Data Competencies
 Proxy problems definition  Different problems lead to different questions, which lead to
  different data sets. Confer acceptability of data source by the definition of the proxy problems.
 Key dimensions of variability  Key dimensions were targeted for collection such as
  time, space, identifier, etc. However, different proxy problems require different key dimensions.
 Capturing scope  The following was explicitly captured:
     Data structure (E.G. graph relationship data vs. graph transaction data vs. dimensional data)
     Data timespan (if time is a dimension)
     Data geospatial footprint (if geospatial is a dimension)
     Data volume (both in total GB and also in total # of rows)
     Determining dataset overlap
 Capturing opinions - Current star ratings based on:
     Data consistency, volume, and persistence
     Data coverage (time and space)
     Data precision (time and space)
     Data genuineness (synthesized data is penalized)
     Data distribution (IE: we may have extremely precise geo-spatial data, but if there are only 40
      unique geospatial points in the data, the geo-spatial aspects arent that interesting)
     Data dimensionality (higher dimensionality with reasonable distributions on each dimension is
      preferred)
Quantitative Data Holdings
Name of the Data                                                                          Date that statistics
Source                                                                                    were last collected
          Initial reviewer                                                                on data
                                                                                                         Location of data Data
          Opinion of Data                                                     Source where               on FTP site        format
          Quality                                      Collection start /
                                                                              data was
                       Description and                 end dates  if
                                                                              acquired                         Size of Data
                       notes on data source            known                                                   (storage space
                       as well as collection                           Geospatial                              and rows)
                                                                                     Data handling
                       information                                     coverage      requirements




                                               Data Tactics Corporation Proprietary and Confidential Material
                                                                                                                      10
Quantitative Data Holdings
Armed Conflict Location and Events Dataset (ACLED)                KDD 2003 Data
AIS Ship Data                                                     KDD 2005 Data
Atmospherics Reports                                              Kiva Data
BrightKite Data                                                   Landscan Data
Classified Ads                                                    LiveJournal Data
CNN                                                               Meme Tracker
Digital Terrain Elevation Data (DTED)                             Meme Twitter TS
Enron Data                                                        NFL Plays
Epinions Data                                                     Night Lights Data
EU Email                                                          Open Data Airtraffic accidents
Facebook                                                          Open Street Maps
Flickr Data                                                       Panoramio Data
Flight Information Data                                           Patent Citations Data
Four Square Data                                                  Photobucket Data
Friend Feed Data                                                  Picasa Web Albums Data
Geolife Data                                                      Processed Employment Data
Gowalla Data                                                      Scamper Data
International Conference on Weblogs and Social Media              ISVG
(ICWSM) Data                                                      Twitter
Identica Data                                                     UNDP
IMDB Data                                                         Weather Data
Knowledge Discovery and Data (KDD) Mining Tools                   Webgraphs
Competition                                                       Youtube



                                     Data Tactics Corporation Proprietary and Confidential Material
Quantitative Data Competencies
  Panoramio / Flickr            Metadata on uploaded public photos provides excellent geospatial and
  temporal resolution, which also provides user information. Estimated 250 million rows of photo metadata
  with over 150 million already gathered.
        AIS  Ship tracking data that provides ship pings as they progress in movement. Precise time and
        geospatial information provided. 50 million records and counting.
        OpenStreetMaps             Over 2 billion geospatial points of mapping enthusiasts tracks across the
        world. Time and userid information also included.
        Gowalla / Brightkite  About 11 million FourSquare style check-ins with user, location, and
        time information provided.

         Example Proxy Problems:
      Discovering Holes in the data where photos are no longer taken to detect avoided areas
      Discovering relationships and links based on co-occurrence between users in time / space
      Tracking and analyzing movement patterns on a local and global scale
      Analyzing image data for changes in the same locations
      Detecting differences in photo activity in an area over time
      Detecting events based on abnormal photo activity behavior
      Mapping UserIds across data sources to create a unified analytic picture
      Detecting home range for each user
      Defining patterns of life by routine activities and movement
      Tracking language usage in areas to determine abnormal language presence in an area
      Local vs tourist movement analysis and extraction
      Trending of location popularity



                                               UNCLASSIFIED                                          12
Quantitative Data Competencies
  Twitter  Sampled ongoing collection of social media tweets with UserId and time.
  Some even have precise location data, but this is not the norm. Collection pulls roughly
  between 1-2 million tweets / day.
       Example Proxy Problems:
    Discovery of crowd-sourced phenomena (e.g., people posting to beware of a certain
     neighborhood)
    Discovery of correlated trends (e.g., finding that people posting about a certain topic in an
     area correlates to higher crime in that area)
    Tracking sentiment on certain topics and issues
    Tracking language usage in areas to determine abnormal language presence in an area




                                          UNCLASSIFIED                                    13
Quantitative Data Competencies
 How can we infer movement patterns from vast amounts of what appears to
  be just point data collected in time and associated with an identifier (IE:
  UserId / bank account / etc)?
 Technique is applicable to Twitter, FourSquare and MANY other sources




                                                        Volume plot of photos binned by area on log scale
                                                                    Paris as seen from Flickr over all time




                                                                                               14
Quantitative Data Competencies
 1.    Goal: to catch active moment between locations a small distance apart
 2.    Typically two to around a dozen points chained together
 3.    Located in a small area, but with a definite path through the area
 4.    Sampled in rapid succession (less than X seconds between points)
 5.    Thousands or millions of micro-paths make a full path to view
                                                                                                Segment ignored:
                                                         Segment ignored:                       Velocity too fast
                                   Photo taken           120 seconds between points
 Photo taken                                                                                                              Photo taken
                                   2012-08-15 12:35:25
 2012-08-15 12:34:59                                                                                                      2012-08-15 12:37:46
                                                                          Photo taken
                                                                          2012-08-15 12:37:35




                  Photo taken
                  2012-08-15 12:35:11                                                           Person A                               Common
                                                           Photo taken                                                                   path




                                                                                                                          10 seconds
                                                           2012-08-15 12:37:25
                                                                                                Person B




                                                                                                              3 seconds
                                                                                                                                        pattern
       A Micropath example                                                                                                             forming
                                                                                                Person C




            Overlay thousands / millions of these tiny micropaths together
                                    and you get
                                                           UNCLASSIFIED                                                                  15
Quantitative Data Competencies
   View of Paris using a 60 second segment timeout and 80km/hour cutoff on Flickr data
                    Arc de Triomphe

                          Apparent typical approach pathway to the Arc




                                                           Place de la Concorde


                                                                             Louvre

                                                                                       Harder to see, but
                                                  Place de la
                                                                                        you can see the
                     Eiffel Tower             Concorde typically
                                                                                      typical approach /
                                               approached from
                                                                                      exit pathways from
                                              southern direction
                                                                                          Notre Dame.
                                                              Notre Dame




   Red strip appears to
    be line of sight to
     the Eiffel Tower




                                               UNCLASSIFIED                                         16
Quantitative Data Competencies
   Aggregate micro-pathing on a world of photo metadata with no speed,
                      time, or distance restrictions




                               UNCLASSIFIED                         17
Quantitative Data Competencies
   AIS ship tracking micro-path blanket with no time / space filters


                                                      Japans south coast


                                                Chinas coast with
                                                high levels of activity

                                               Coast of Taiwan




                              UNCLASSIFIED                                18
Quantitative Data Competencies
Flickr Paris 2004 changes vs 2005
                                    Hh: [HIGH, high]- an increase between Xt1 -> Xt2 relative to respective (Xt1, Xt2)
                                    reference distribution where t1, t2 belong to T. HIGH reflects a strong increase
                                    of ones own values (dxi) at location i between t1 and t2 relative to the change
                                    of neighboring values (dy). high reflects a modest increase of dy relative to
                                    values of dx. Neighbors are defined with the spatially lagged variable Wy, as
                                    the eight nearest observations.

                                    lL: low, LOW [low, LOW]- a decrease between Xt1 -> Xt2 relative to respective
                                    (Xt1, Xt2) reference distribution where t1, t2 belong to T. low reflects a modest
                                    decrease of ones own values (dxi) at location i between t1 and t2 relative to the
                                    change of neighboring values (dy). LOW reflects a strong decrease of
                                    neighboring values of dx.

                                    Neighbors are defined with the spatially lagged variable Wy, as the eight
                                    nearest observations.
Flickr Paris 2011 changes vs 2010




                                     UNCLASSIFIED                                                          19
Quantitative Data Competencies
New Year provides lots of photos
                                             Paris
                                                   Bastille Day
         Recurrent red strips show the recurring
         weekend
                                                                  Number of distinct
                                                                    photographers




                                               Day in year
                                              UNCLASSIFIED                             20
Quantitative Data Competencies
5 day Carnival celebration
                                                   Caracas
                                        Some interesting dates for low
                                               volume activity       Number of distinct
                                                                            photographers




                                                              Day in year
Image from www.flickr.com/photos/globovision/6911554143
                                                          UNCLASSIFIED                      21
Quantitative Data Competencies
                                       Airline Flight Data Anomaly Detection
                During an unusual event, such as a winter storm show below, the ARIMA still follows the
                pattern but doesnt match as well. These areas where the red and black dont match are
                where unusual events have occurred.
ZeroFill

           40
           0




                                                                               02-13

                                                             Index
ZeroFill

           40
           0




                                                                               02-13

                                                             Index



            Plot of the count of
            points where the
            difference between the
            expected number of
            flights leaving an airport
            based on the model and
            the actual observed
            number of flights was
            statistically significant.
                                                      UNCLASSIFIED                                   22
Quantitative Data Competencies
 Raw data file:
 Each line is a comma separated list of values.

 key1, timestamp, value                                     Key1 2.4,3.4,0.99,
 key2, timestamp, value                                     Key2 3.4,4.3,1.0,0.6.
                                     Cloud-backed           ..
 
                                     transformation
                                                                     Vector file:
                                                                     Each line has a key and a comma
                                                                     separated list of values.
                           Correlation analytic

                                                                                     Implemented in:
                                    key1          Key2     Key3     Key4
                                                                                      Python (RAM)
                          Key1         -          0.93     0.43     0.001             Hive
                          Key2         -            -      -0.5     -0.03             Mahout
                                                                                      Spark
                          Key3         -            -        -       .32
                                                                                      Giraph
                          Key4         -            -        -         -              Cascalog
                       For each vector calculate the correlation to
                       each other vector. We use a Pearson
                       correlation.

                                                  UNCLASSIFIED                                  23
Quantitative Data Competencies
   Training         Test         Approximation engine for the O(n族) correlation
   Engine          Engine                      matrix problem

           Spark
                                 Technique based on Google Correlate

  Approximation provides
  orders of magnitude of
  speedup when compared to
  equivalent brute force
  methods. The technique
  works best for highly
  correlated items and uses a
  series of data
  projections, unsupervised
  learning, and vector
  quantization to provide
  dimensionality reduction for
  incoming complex vectors.

                                     UNCLASSIFIED                             24

More Related Content

Capabilities Brief Analytics

  • 1. DT Core Analytical Competencies Data Engineering Data Architecture Design and Development Large Scale Enterprise Architecture and Design Migrate, Extract, Transform, and Load Data Spatial, Multi-Domain, and Cloud Base Data Services Analytics Quantitative Data Transformation and Ingestion Dissemination and Reporting Tools Data Mining, Exploitation, and Correlation Tools Spatial Data Mining and Geographic Knowledge Discovery Data Tactics Corporation Proprietary and Confidential Material
  • 2. DT Core Analytical Competencies The Team: Graduates of top tier universities to include Stanford, Caltech and MIT as well as ties to these and local universities. Degrees include Mathematics, Computer Science, Aeronautical Engineering, Astrophysics, Electrical Engineering, Mechanical Engineering, Statistics and Social Sciences. Competencies include data mining, machine learning, statistics, spatial statistics, Bayesian statistics, econometrics, computational geometry, spatial econometrics, applied mathematics, theoretical robotics, dynamic systems, control theory. Foci include unsupervised cross-modal clustering algorithms, principle component analysis, independent component analysis, regression, spatial regression, geographic weighted regression, zeroth order processing, nonlinear optimization, autoregressive models, time-series analysis, spatial regime models, HAC models. Technical Competencies include Data Tactics Corporation Proprietary and Confidential Material
  • 3. Data Tactics Analytics Cell Data Tactics Corporation Proprietary and Confidential Material
  • 4. Analytics Competencies ZeroFill 40 Time Series Analytics (i) (i) 0 Applying the ARIMA model in a 02-13 Index parallelized environment to provide anomaly detection Correlation Analytics (ii) Brute force pairwise Pearsons correlation over vectors in a cloud-backed engine Aggregation Analytics (iii) Aggregate micro-pathing Repurposing data to analyze (ii) and display movement patterns Dwell time calculations Analytic to discover areas of interest based on movement activity Graph Analytics (iiii) Discovering social interaction models and paradigms within (iii) network data (iiii) 4 Data Tactics Corporation Proprietary and Confidential Material
  • 5. Analytics Competencies Directional Spatio-Temporal Analytics (i) (i) Compare distributions with a focus on changes in morphology of the distribution and mobility of individual observations within the distribution over that same period of time over space (Wy) Local Classification (ii) Non-self-similarities & self-similarities; (i) within and between group correlations. Ecological Analytics (ii) Regression Modeling Spatial Regression Spatial Regime Models HAC Models 5 Data Tactics Corporation Proprietary and Confidential Material
  • 6. Data Tactics Data Repository Data Tactics Corporation Proprietary and Confidential Material
  • 7. Quantitative Data Competencies Proxy problems definition Different problems lead to different questions, which lead to different data sets. Confer acceptability of data source by the definition of the proxy problems. Key dimensions of variability Key dimensions were targeted for collection such as time, space, identifier, etc. However, different proxy problems require different key dimensions. Capturing scope The following was explicitly captured: Data structure (E.G. graph relationship data vs. graph transaction data vs. dimensional data) Data timespan (if time is a dimension) Data geospatial footprint (if geospatial is a dimension) Data volume (both in total GB and also in total # of rows) Determining dataset overlap Capturing opinions - Current star ratings based on: Data consistency, volume, and persistence Data coverage (time and space) Data precision (time and space) Data genuineness (synthesized data is penalized) Data distribution (IE: we may have extremely precise geo-spatial data, but if there are only 40 unique geospatial points in the data, the geo-spatial aspects arent that interesting) Data dimensionality (higher dimensionality with reasonable distributions on each dimension is preferred)
  • 8. Quantitative Data Holdings Name of the Data Date that statistics Source were last collected Initial reviewer on data Location of data Data Opinion of Data Source where on FTP site format Quality Collection start / data was Description and end dates if acquired Size of Data notes on data source known (storage space as well as collection Geospatial and rows) Data handling information coverage requirements Data Tactics Corporation Proprietary and Confidential Material 10
  • 9. Quantitative Data Holdings Armed Conflict Location and Events Dataset (ACLED) KDD 2003 Data AIS Ship Data KDD 2005 Data Atmospherics Reports Kiva Data BrightKite Data Landscan Data Classified Ads LiveJournal Data CNN Meme Tracker Digital Terrain Elevation Data (DTED) Meme Twitter TS Enron Data NFL Plays Epinions Data Night Lights Data EU Email Open Data Airtraffic accidents Facebook Open Street Maps Flickr Data Panoramio Data Flight Information Data Patent Citations Data Four Square Data Photobucket Data Friend Feed Data Picasa Web Albums Data Geolife Data Processed Employment Data Gowalla Data Scamper Data International Conference on Weblogs and Social Media ISVG (ICWSM) Data Twitter Identica Data UNDP IMDB Data Weather Data Knowledge Discovery and Data (KDD) Mining Tools Webgraphs Competition Youtube Data Tactics Corporation Proprietary and Confidential Material
  • 10. Quantitative Data Competencies Panoramio / Flickr Metadata on uploaded public photos provides excellent geospatial and temporal resolution, which also provides user information. Estimated 250 million rows of photo metadata with over 150 million already gathered. AIS Ship tracking data that provides ship pings as they progress in movement. Precise time and geospatial information provided. 50 million records and counting. OpenStreetMaps Over 2 billion geospatial points of mapping enthusiasts tracks across the world. Time and userid information also included. Gowalla / Brightkite About 11 million FourSquare style check-ins with user, location, and time information provided. Example Proxy Problems: Discovering Holes in the data where photos are no longer taken to detect avoided areas Discovering relationships and links based on co-occurrence between users in time / space Tracking and analyzing movement patterns on a local and global scale Analyzing image data for changes in the same locations Detecting differences in photo activity in an area over time Detecting events based on abnormal photo activity behavior Mapping UserIds across data sources to create a unified analytic picture Detecting home range for each user Defining patterns of life by routine activities and movement Tracking language usage in areas to determine abnormal language presence in an area Local vs tourist movement analysis and extraction Trending of location popularity UNCLASSIFIED 12
  • 11. Quantitative Data Competencies Twitter Sampled ongoing collection of social media tweets with UserId and time. Some even have precise location data, but this is not the norm. Collection pulls roughly between 1-2 million tweets / day. Example Proxy Problems: Discovery of crowd-sourced phenomena (e.g., people posting to beware of a certain neighborhood) Discovery of correlated trends (e.g., finding that people posting about a certain topic in an area correlates to higher crime in that area) Tracking sentiment on certain topics and issues Tracking language usage in areas to determine abnormal language presence in an area UNCLASSIFIED 13
  • 12. Quantitative Data Competencies How can we infer movement patterns from vast amounts of what appears to be just point data collected in time and associated with an identifier (IE: UserId / bank account / etc)? Technique is applicable to Twitter, FourSquare and MANY other sources Volume plot of photos binned by area on log scale Paris as seen from Flickr over all time 14
  • 13. Quantitative Data Competencies 1. Goal: to catch active moment between locations a small distance apart 2. Typically two to around a dozen points chained together 3. Located in a small area, but with a definite path through the area 4. Sampled in rapid succession (less than X seconds between points) 5. Thousands or millions of micro-paths make a full path to view Segment ignored: Segment ignored: Velocity too fast Photo taken 120 seconds between points Photo taken Photo taken 2012-08-15 12:35:25 2012-08-15 12:34:59 2012-08-15 12:37:46 Photo taken 2012-08-15 12:37:35 Photo taken 2012-08-15 12:35:11 Person A Common Photo taken path 10 seconds 2012-08-15 12:37:25 Person B 3 seconds pattern A Micropath example forming Person C Overlay thousands / millions of these tiny micropaths together and you get UNCLASSIFIED 15
  • 14. Quantitative Data Competencies View of Paris using a 60 second segment timeout and 80km/hour cutoff on Flickr data Arc de Triomphe Apparent typical approach pathway to the Arc Place de la Concorde Louvre Harder to see, but Place de la you can see the Eiffel Tower Concorde typically typical approach / approached from exit pathways from southern direction Notre Dame. Notre Dame Red strip appears to be line of sight to the Eiffel Tower UNCLASSIFIED 16
  • 15. Quantitative Data Competencies Aggregate micro-pathing on a world of photo metadata with no speed, time, or distance restrictions UNCLASSIFIED 17
  • 16. Quantitative Data Competencies AIS ship tracking micro-path blanket with no time / space filters Japans south coast Chinas coast with high levels of activity Coast of Taiwan UNCLASSIFIED 18
  • 17. Quantitative Data Competencies Flickr Paris 2004 changes vs 2005 Hh: [HIGH, high]- an increase between Xt1 -> Xt2 relative to respective (Xt1, Xt2) reference distribution where t1, t2 belong to T. HIGH reflects a strong increase of ones own values (dxi) at location i between t1 and t2 relative to the change of neighboring values (dy). high reflects a modest increase of dy relative to values of dx. Neighbors are defined with the spatially lagged variable Wy, as the eight nearest observations. lL: low, LOW [low, LOW]- a decrease between Xt1 -> Xt2 relative to respective (Xt1, Xt2) reference distribution where t1, t2 belong to T. low reflects a modest decrease of ones own values (dxi) at location i between t1 and t2 relative to the change of neighboring values (dy). LOW reflects a strong decrease of neighboring values of dx. Neighbors are defined with the spatially lagged variable Wy, as the eight nearest observations. Flickr Paris 2011 changes vs 2010 UNCLASSIFIED 19
  • 18. Quantitative Data Competencies New Year provides lots of photos Paris Bastille Day Recurrent red strips show the recurring weekend Number of distinct photographers Day in year UNCLASSIFIED 20
  • 19. Quantitative Data Competencies 5 day Carnival celebration Caracas Some interesting dates for low volume activity Number of distinct photographers Day in year Image from www.flickr.com/photos/globovision/6911554143 UNCLASSIFIED 21
  • 20. Quantitative Data Competencies Airline Flight Data Anomaly Detection During an unusual event, such as a winter storm show below, the ARIMA still follows the pattern but doesnt match as well. These areas where the red and black dont match are where unusual events have occurred. ZeroFill 40 0 02-13 Index ZeroFill 40 0 02-13 Index Plot of the count of points where the difference between the expected number of flights leaving an airport based on the model and the actual observed number of flights was statistically significant. UNCLASSIFIED 22
  • 21. Quantitative Data Competencies Raw data file: Each line is a comma separated list of values. key1, timestamp, value Key1 2.4,3.4,0.99, key2, timestamp, value Key2 3.4,4.3,1.0,0.6. Cloud-backed .. transformation Vector file: Each line has a key and a comma separated list of values. Correlation analytic Implemented in: key1 Key2 Key3 Key4 Python (RAM) Key1 - 0.93 0.43 0.001 Hive Key2 - - -0.5 -0.03 Mahout Spark Key3 - - - .32 Giraph Key4 - - - - Cascalog For each vector calculate the correlation to each other vector. We use a Pearson correlation. UNCLASSIFIED 23
  • 22. Quantitative Data Competencies Training Test Approximation engine for the O(n族) correlation Engine Engine matrix problem Spark Technique based on Google Correlate Approximation provides orders of magnitude of speedup when compared to equivalent brute force methods. The technique works best for highly correlated items and uses a series of data projections, unsupervised learning, and vector quantization to provide dimensionality reduction for incoming complex vectors. UNCLASSIFIED 24