The document describes how Advanced Data Mining used data mining techniques to create an accurate model of an expansive groundwater system in Florida using data from hundreds of wells. Data mining was used to classify wells, assign locations to classes, and create individual models for each class to predict water levels. The resulting groundwater model integrated all class models and could accurately simulate and predict water levels over time. The data mining model was developed much faster than a traditional model and was more accurate at predicting water levels.
2. Data Mining anData Mining an
Expansive GroundwaterExpansive Groundwater
SystemSystem
Presents..!Presents..!
3. Press your Pause key to stop/
restart this presentation at any
time.
Press your Esc key to end it.
4. Advanced Data Mining (ADMAdvanced Data Mining (ADMii) has) has
developed unique Data Mining technologydeveloped unique Data Mining technology
for modeling natural systems. This videofor modeling natural systems. This video
demonstrates its application to andemonstrates its application to an
expansive groundwater system.expansive groundwater system.
Data Mining extracts valuable knowledgeData Mining extracts valuable knowledge
from large amounts of data. It employsfrom large amounts of data. It employs
advanced methods from several scientificadvanced methods from several scientific
disciplines.disciplines.
5. The groundwaterThe groundwater
system of interest issystem of interest is
the Upper Floridianthe Upper Floridian
Aquifer in theAquifer in the
Suwannee River ValleySuwannee River Valley
6. This system is approximately 100 x 120This system is approximately 100 x 120
miles with a maximum surface elevationmiles with a maximum surface elevation
of 220 feet.of 220 feet.
The following illustration shows itsThe following illustration shows its
topography. Land elevation is indicatedtopography. Land elevation is indicated
by the key at left. The path of theby the key at left. The path of the
Suwannee River can be readily seenSuwannee River can be readily seen
near the center.near the center.
9. This groundwater resource isThis groundwater resource is
managed by the Suwannee Rivermanaged by the Suwannee River
Management District in Live Oak,Management District in Live Oak,
Florida.Florida.
They maintain a network of severalThey maintain a network of several
hundred wells that provide datahundred wells that provide data
about the behavior of the aquifer.about the behavior of the aquifer.
10. The following shows the locations ofThe following shows the locations of
wells for which there are significantwells for which there are significant
amounts of data.amounts of data.
Note that some areas have severalNote that some areas have several
wells clustered together and thatwells clustered together and that
others have few or none.others have few or none.
12. Histories for a few wells go back to theHistories for a few wells go back to the
1940s, however, the record prior to1940s, however, the record prior to
1982 is sparse.1982 is sparse.
The vertical blue streaks in theThe vertical blue streaks in the
following 3D image show the historicalfollowing 3D image show the historical
range of individual wells. Together theyrange of individual wells. Together they
show the dynamic range of the aquifer.show the dynamic range of the aquifer.
14. Collectively, these data comprise aCollectively, these data comprise a
vast, but unwieldy source ofvast, but unwieldy source of
potentially valuable knowledge.potentially valuable knowledge.
We researched how Data MiningWe researched how Data Mining
could be used to extract knowledgecould be used to extract knowledge
about this complex system andabout this complex system and
others like it.others like it.
15. Computer models of groundwaterComputer models of groundwater
systems are important tools for learningsystems are important tools for learning
how these invaluable resources arehow these invaluable resources are
affected by weather, pumping and landaffected by weather, pumping and land
development.development.
Our goal was to use Data Mining toOur goal was to use Data Mining to
create an accurate model of thecreate an accurate model of the
aquifers water level.aquifers water level.
16. The following is a 25 x 30 mileThe following is a 25 x 30 mile
detail from near the center of thedetail from near the center of the
system. It shows the positions of 22system. It shows the positions of 22
wells and their histories since 1982.wells and their histories since 1982.
Note that the two groups of circledNote that the two groups of circled
wells clearly behave differently fromwells clearly behave differently from
each other.each other.
18. Because the wells exhibited so manyBecause the wells exhibited so many
different behaviors, it was necessarydifferent behaviors, it was necessary
to group them into classes. Wellsto group them into classes. Wells
assigned to a particular class behaveassigned to a particular class behave
similarly.similarly.
Data MiningData Mining optimallyoptimally determined thedetermined the
number of classes and how the wellsnumber of classes and how the wells
would be assigned.would be assigned.
19. The following shows that 12 classesThe following shows that 12 classes
were used and how the wells werewere used and how the wells were
assigned. The classes are numberedassigned. The classes are numbered
1 to 12.1 to 12.
It was surprising how some classesIt was surprising how some classes
are distributed over a broad area andare distributed over a broad area and
are intermingled with other classes.are intermingled with other classes.
21. Closer inspection showed that DataCloser inspection showed that Data
Mining did indeed optimally assignMining did indeed optimally assign
the wells.the wells.
The following shows the normalizedThe following shows the normalized
histories of wells for two of thehistories of wells for two of the
classes.classes.
Note the seasonal variability.Note the seasonal variability.
23. The next Data Mining task was to assignThe next Data Mining task was to assign
aquifer locations to the 12 classes.aquifer locations to the 12 classes.
Locations were optimally assignedLocations were optimally assigned
based on their topologicalbased on their topological
characteristics and proximity to wellscharacteristics and proximity to wells
whose classes were known.whose classes were known.
Results are shown in the following.Results are shown in the following.
25. The next Data Mining task was toThe next Data Mining task was to
create a water level model for eachcreate a water level model for each
class. Every location was assigned toclass. Every location was assigned to
a class, and therefore, a model.a class, and therefore, a model.
Inputs to each model were theInputs to each model were the
characteristics of a location and watercharacteristics of a location and water
levels of selected wells. The outputlevels of selected wells. The output
was the predicted water level of thewas the predicted water level of the
location.location.
26. The models are very accurate.The models are very accurate.
Accuracy can be checked at locationsAccuracy can be checked at locations
where there are well histories.where there are well histories.
The following compares predictions toThe following compares predictions to
actual histories for wells of fouractual histories for wells of four
different classes. The water levels aredifferent classes. The water levels are
normalized to land surface elevation.normalized to land surface elevation.
27. History from April 1982 to October 1998
NormalizedWaterLevelaboveSeaLevel
Actual
Prediction
Class 1Class 1
28. History from April 1982 to October 1998
NormalizedWaterLevelaboveSeaLevel Class 3Class 3
Actual
Prediction
29. History from April 1982 to October 1998
NormalizedWaterLevelaboveSeaLevel
Actual
Prediction
Class 6Class 6
30. History from April 1982 to October 1998
NormalizedWaterLevelaboveSeaLevel
Actual
Prediction
Class 10Class 10
31. The model of the aquifer is actually aThe model of the aquifer is actually a
collection of models, one for each class.collection of models, one for each class.
A computer program was created thatA computer program was created that
integrates the models, a history database,integrates the models, a history database,
and a graphical user interface.and a graphical user interface.
The following shows a long termThe following shows a long term
simulation of the aquifers water levelsimulation of the aquifers water level
generated by the model. Note the colorgenerated by the model. Note the color
key at right, and that time is reversed.key at right, and that time is reversed.
68. Often multi-dimensional visualizationOften multi-dimensional visualization
reveals important information thatreveals important information that
would otherwise go unnoticed. ADMwould otherwise go unnoticed. ADMii
has world-class capabilities inhas world-class capabilities in
advanced visualization technology.advanced visualization technology.
The following shows the modelsThe following shows the models
prediction of the upper range (ceiling)prediction of the upper range (ceiling)
of the aquifer. The vertical scale isof the aquifer. The vertical scale is
exaggerated to show details.exaggerated to show details.
80. The following compares theThe following compares the
models prediction of the floormodels prediction of the floor
and ceiling of the aquifer.and ceiling of the aquifer.
85. The following shows the predictedThe following shows the predicted
aquifer level for the period fromaquifer level for the period from
January 1995 to October 1998.January 1995 to October 1998.
Note the spatially asynchronousNote the spatially asynchronous
motions caused by variability inmotions caused by variability in
rainfall and the Suwannee Riversrainfall and the Suwannee Rivers
stage.stage.
132. This Data Mining-based model requiredThis Data Mining-based model required
about 10 weeks to develop.about 10 weeks to develop.
A conventional finite-difference model ofA conventional finite-difference model of
the same natural system was developedthe same natural system was developed
by a government agency. It took over 3by a government agency. It took over 3
years to complete! It is much lessyears to complete! It is much less
accurate at predicting water level.accurate at predicting water level.
ConclusionConclusion
ss
133. Data Mining is incredibly powerful forData Mining is incredibly powerful for
extracting knowledge about complexextracting knowledge about complex
natural systems from databases.natural systems from databases.
The models can be more accurateThe models can be more accurate
than traditional approaches, andthan traditional approaches, and
require much less time to develop.require much less time to develop.
ConclusionConclusion
ss