際際滷

際際滷Share a Scribd company logo
Solving Challenges with 'Huge Data'
Solutions & client cases
Dr. Axel Koester - axel.koester@de.ibm.com
Chief Technologist EMEA Storage Competence Center
Chairman of TEC think tank D/A/CH
2
Three ways how IT uses data  today
Procedural (ifthen)
Image: Business over Broadway
Statistical (big data)
Machine learning
Image: opendatascience.com
3
 and in 10 years
Procedural
(ifthen)
Image: Business over Broadway
Statistical
(big data)
Machine learning
Image: opendatascience.com
4
Current examples
Image: Business over Broadway Image: opendatascience.com
shopping, profiling,
fraud detection 
autonomous driving,
image classification,
chatbots, gaming
Manual modelling Accumulation of examples Automatic modelling
business as usual
classic / legacy IT
5
OK
defect
defect
defect
defect
defect
defect
defect
defect
Example of trained (rather than programmed) quality inspection
6
Train-on-the-job by reviewing low-confidence cases
MUCH CHEAPER THAN RE-CODING AT EVERY PROD CHANGE
7
Procedural:
Archive test cases
for auditing
Statistical:
Parallel processing of
many stored samples
Machine Learning:
Train sample data, then
archive or trade data
Image: Business over Broadway Image: opendatascience.com
How is data stored?
ifthenelse
GB/s
1
2
GB/s
3
parallel search
10
Imperatives for data storage:
implement workflows
avoid "data tourism"
scale without effort
11
DESY: Example for a solved "data tourism" problem
12
DESY data: Synchrotron X-ray imaging
13
Data tourism
Lambda: 60 Gb/s @ 2000 Hz
Eiger: 30 Gb/s @ 2000 Hz
2000files/s/cam
Webportalaccess
IBM Spectrum Scale + Workflow rules
3D reconstruction,
research calculus
2000
files/s/cam
MQ
cluster lifecycle
cluster
14
[Next-gen storage] Prototype wrote 50k Files/sec in one folder*
-- started at 02/28/2017 12:13:13 --
mdtest-1.9.3 was launched with 14 total task(s) on 14 node(s)
Command line used: /ghome/oehmes/mpi/bin/mdtest-pcmpi9131-existingdir -d /gpfs/fs2-
1m-me1/shared/mdtest-ec -i 1 -n 35000 -F -w 0 -Z -p 8
Path: /gpfs/fs2-1m-me1/shared
FS: 17.1 TiB Used FS: 0.1% Inodes: 476.8 Mi Used Inodes: 0.1%
14 tasks, 490000 files
SUMMARY: (of 1 iterations)
Operation Max Min Mean Std Dev
--------- --- --- ---- -------
File creation : 50032.690 50032.690 50032.690 0.000
File stat : 3937604.341 3937604.341 3937604.341 0.000
File read : 941193.073 941193.073 941193.073 0.000
File removal : 143095.519 143095.519 143095.519 0.000
Tree creation : 77672.296 77672.296 77672.296 0.000
Tree removal : 0.239 0.239 0.239 0.000
-- finished at 02/28/2017 12:13:39 --
(*) in independent folders, the test cluster could write 2,6 Mio 32k files/sec
15
More Workflow Examples
16
Newly acquired evidence data:
 Automatic generation of an immutable
copy before the investigation
 Life cycle management adjusted to
investigation requirements
 Life cycle management of the immutable
copy fully automated (according to law)
Workflow Automation: Preserving crime evidence data
Workflow included + Immutability included
17
Heavily used in broadcasting, but also for:
 CCTV (highlighting, automatic archiving & deletion)
 Medical tomography scans
 Fingerprint processing (association, feature extraction, distribution)
 Legal rich media document processing
Workflow Automation: Handling connected documents
IBM AREMAArchive and Essence Manager
and many
more
used by
18
The mother of all data projects
Square Kilometre Array (SKA)
19
Radio Interferometry data capture: Square Kilometre Array (SKA)
will be the worlds largest radio telescope
牟 900 stations
牟 300 antennas / station
牟 begin of construction planned in 2018
Substantial technological challenges
牟 160 terabytes of raw data collected per second
牟 1 petabyte of data stored per day
牟 1000 petaflops per second processing power
IBM's R&D involvement since 2012
牟 Research collaboration with Astron (Dutch Institute of Radio Astronomy)
牟 Storage aspects
牟 ExaPlan: planning tool for multi-tiered exascale storage
牟 Tape library modeling and simulation
牟 Predictive cachingArtists rendering of the SKA
20
For everyone else:
Build your private cloud foundation
21
S3-compatible Private Cloud as "everybody's offload storage"
driven by public cloud pricing, reducing cost by enhancing storage footprint efficiency
Organization-wide S3-compatible repository
IBM Cloud Object Storage
x86 image (contains OS)
Offload snapshots
Offload stale
volumes
IBM Spectrum Virtualize IBM Spectrum Scale
Multi-vendor
block storage
IBM file clusters (NAS)
SMB/CIFS NFS
POSIX HDFS
Disk TapeFlash
Offload old files
Offload snapshots
Cloud backup
IBM Spectrum Protect
IBM backup
Cloud backup
Cloud-2-Cloud
migration
Systems
VMs
Users
Archive
SEC-legal retention mode + deletion hold per object$$
available as appliances
22
All-or-Nothing-Transform (AONT) for safety, reliability and security
5 nines write availability, 6 nines read availability,
15+ nines reliability against data loss (3 sites)
IBM Cloud Object Storage
x86 image (contains OS)
Geographical Information Dispersal Algorithm
E.g. "encode data in 12 slices, needs 7 slices for decoding"
JBOD
undecipherable
$$
JBOD
23
How Sky avoids bottlenecks, service outages and hacking
 Object access is lightweight & secure,
resulting in low CPU footprint & cost
browser obtains object ID (movie)
24
Bonus
Artificial Intelligence Research for
Storage Management
25
AI learns to predict ideal storage based on meta-information
G. Cherubini, J. Jelitto, V. Venkatesan, Cognitive Storage for Big Data, Computer, April 2016
26
Data Life Cycle Prediction based on experience
Life cycles of different data types
Prediction Quality
10% Training: 95% Success
worst case (low predictable data class)
27
Data Prioritization Prediction after Blackout recovery

Recovery relevance
(Synchronous? Consistent? Expendable?)
Prediction Quality
important transactions, no loss tolerated
Temp Data
t
R
ibm.biz/AxelKoester
29
Quantum Computer:
Nobody needs one at home
Ken Olsen, founder of Digital Equipment Corporation, 1977
30
31 IBM Quantum Computing Scientists Hanhee Paik (left) and Sarah Sheldon (right)
32
33
34
37
38
January 2018: 50 Bit
39
Quantum Computer:
Nobody needs one at home
Search for IBM quantum experience
https://quantumexperience.ng.bluemix.net/qx
ibm.biz/AxelKoester

More Related Content

Similar to Solving Challenges With 'Huge Data' (20)

JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Towards differential query servic...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Towards differential query servic...JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Towards differential query servic...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Towards differential query servic...
IEEEGLOBALSOFTTECHNOLOGIES
Towards differential query services in cost efficient clouds
Towards differential query services in cost efficient cloudsTowards differential query services in cost efficient clouds
Towards differential query services in cost efficient clouds
IEEEFINALYEARPROJECTS
Aerospike for machine learning
Aerospike for machine learningAerospike for machine learning
Aerospike for machine learning
Aerospike
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
Frank kramer ibm-data_management-for-adas-scale-usergroup-sin-032018
Frank kramer ibm-data_management-for-adas-scale-usergroup-sin-032018Frank kramer ibm-data_management-for-adas-scale-usergroup-sin-032018
Frank kramer ibm-data_management-for-adas-scale-usergroup-sin-032018
Snowy Chen
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and Virtualization
Peter Tr旦ger
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
Vladimir Simek
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
xKinAnx
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
solarisyourep
Vortrag ralph behrens_ibm-data
Vortrag ralph behrens_ibm-dataVortrag ralph behrens_ibm-data
Vortrag ralph behrens_ibm-data
Aravindharamanan S
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidSpeeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Alluxio, Inc.
Storage Conference 08 V2
Storage Conference 08 V2Storage Conference 08 V2
Storage Conference 08 V2
Pini Cohen
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithm
Dipak Badhe
Architecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationArchitecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentation
Vlad Ponomarev
Designing for Privacy in AWS cloud
Designing for Privacy in AWS cloudDesigning for Privacy in AWS cloud
Designing for Privacy in AWS cloud
Krzysztof Kkol
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
Avere Systems
S104876 ibm-cos-jburg-v1809b
S104876 ibm-cos-jburg-v1809bS104876 ibm-cos-jburg-v1809b
S104876 ibm-cos-jburg-v1809b
Tony Pearson
IBM FlashSystem 7300 Product Guide.pdf
IBM FlashSystem 7300
Product Guide.pdfIBM FlashSystem 7300
Product Guide.pdf
IBM FlashSystem 7300 Product Guide.pdf
Freelance Architect Informations systems
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
programmermag
Oracle Exec Summary 7000 Unified Storage
Oracle Exec Summary 7000 Unified StorageOracle Exec Summary 7000 Unified Storage
Oracle Exec Summary 7000 Unified Storage
David R. Klauser
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Towards differential query servic...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Towards differential query servic...JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Towards differential query servic...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Towards differential query servic...
IEEEGLOBALSOFTTECHNOLOGIES
Towards differential query services in cost efficient clouds
Towards differential query services in cost efficient cloudsTowards differential query services in cost efficient clouds
Towards differential query services in cost efficient clouds
IEEEFINALYEARPROJECTS
Aerospike for machine learning
Aerospike for machine learningAerospike for machine learning
Aerospike for machine learning
Aerospike
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
Frank kramer ibm-data_management-for-adas-scale-usergroup-sin-032018
Frank kramer ibm-data_management-for-adas-scale-usergroup-sin-032018Frank kramer ibm-data_management-for-adas-scale-usergroup-sin-032018
Frank kramer ibm-data_management-for-adas-scale-usergroup-sin-032018
Snowy Chen
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and Virtualization
Peter Tr旦ger
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
Vladimir Simek
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
xKinAnx
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
solarisyourep
Vortrag ralph behrens_ibm-data
Vortrag ralph behrens_ibm-dataVortrag ralph behrens_ibm-data
Vortrag ralph behrens_ibm-data
Aravindharamanan S
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidSpeeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Alluxio, Inc.
Storage Conference 08 V2
Storage Conference 08 V2Storage Conference 08 V2
Storage Conference 08 V2
Pini Cohen
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithm
Dipak Badhe
Architecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationArchitecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentation
Vlad Ponomarev
Designing for Privacy in AWS cloud
Designing for Privacy in AWS cloudDesigning for Privacy in AWS cloud
Designing for Privacy in AWS cloud
Krzysztof Kkol
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
Avere Systems
S104876 ibm-cos-jburg-v1809b
S104876 ibm-cos-jburg-v1809bS104876 ibm-cos-jburg-v1809b
S104876 ibm-cos-jburg-v1809b
Tony Pearson
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
programmermag
Oracle Exec Summary 7000 Unified Storage
Oracle Exec Summary 7000 Unified StorageOracle Exec Summary 7000 Unified Storage
Oracle Exec Summary 7000 Unified Storage
David R. Klauser

More from IBM Sverige (20)

Trender, inspirationer och visioner - Mikael Haglund #ibmbpsse18
Trender, inspirationer och visioner - Mikael Haglund #ibmbpsse18Trender, inspirationer och visioner - Mikael Haglund #ibmbpsse18
Trender, inspirationer och visioner - Mikael Haglund #ibmbpsse18
IBM Sverige
AI hur l奪ngt har vi kommit? Oskar Malmstr旦m, IBM #ibmbpsse18
AI  hur l奪ngt har vi kommit?  Oskar Malmstr旦m, IBM #ibmbpsse18AI  hur l奪ngt har vi kommit?  Oskar Malmstr旦m, IBM #ibmbpsse18
AI hur l奪ngt har vi kommit? Oskar Malmstr旦m, IBM #ibmbpsse18
IBM Sverige
#ibmbpsse18 - The journey to AI - Mikko H旦rkk旦, Elinar
#ibmbpsse18 - The journey to AI - Mikko H旦rkk旦, Elinar#ibmbpsse18 - The journey to AI - Mikko H旦rkk旦, Elinar
#ibmbpsse18 - The journey to AI - Mikko H旦rkk旦, Elinar
IBM Sverige
#ibmbpsse18 - Koppla s辰kert & redundant till IBM Cloud - Magnus Huss, Interexion
#ibmbpsse18 - Koppla s辰kert & redundant till IBM Cloud - Magnus Huss, Interexion#ibmbpsse18 - Koppla s辰kert & redundant till IBM Cloud - Magnus Huss, Interexion
#ibmbpsse18 - Koppla s辰kert & redundant till IBM Cloud - Magnus Huss, Interexion
IBM Sverige
#ibmbpsse18 - Den svenska marknaden, Andreas Lundgren, CMO, IBM
#ibmbpsse18 - Den svenska marknaden, Andreas Lundgren, CMO, IBM#ibmbpsse18 - Den svenska marknaden, Andreas Lundgren, CMO, IBM
#ibmbpsse18 - Den svenska marknaden, Andreas Lundgren, CMO, IBM
IBM Sverige
Multiresursplanering - Karolinska Universitetssjukhuset
Multiresursplanering - Karolinska UniversitetssjukhusetMultiresursplanering - Karolinska Universitetssjukhuset
Multiresursplanering - Karolinska Universitetssjukhuset
IBM Sverige
Blockchain explored
Blockchain explored Blockchain explored
Blockchain explored
IBM Sverige
Blockchain architected
Blockchain architectedBlockchain architected
Blockchain architected
IBM Sverige
Blockchain explained
Blockchain explainedBlockchain explained
Blockchain explained
IBM Sverige
Grow smarter project kista watson summit 2018_tommy auoja-1
Grow smarter project  kista watson summit 2018_tommy auoja-1Grow smarter project  kista watson summit 2018_tommy auoja-1
Grow smarter project kista watson summit 2018_tommy auoja-1
IBM Sverige
Bemanningsplanering axfood och houston final
Bemanningsplanering axfood och houston finalBemanningsplanering axfood och houston final
Bemanningsplanering axfood och houston final
IBM Sverige
Power ai nordics dcm
Power ai nordics dcmPower ai nordics dcm
Power ai nordics dcm
IBM Sverige
Nvidia and ibm presentation feb18
Nvidia and ibm presentation feb18Nvidia and ibm presentation feb18
Nvidia and ibm presentation feb18
IBM Sverige
Hwx introduction to_ibm_ai
Hwx introduction to_ibm_aiHwx introduction to_ibm_ai
Hwx introduction to_ibm_ai
IBM Sverige
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1
IBM Sverige
Watson kista summit 2018 box
Watson kista summit 2018 box Watson kista summit 2018 box
Watson kista summit 2018 box
IBM Sverige
Watson kista summit 2018 en battre arbetsdag for de manga manniskorna
Watson kista summit 2018   en battre arbetsdag for de manga manniskornaWatson kista summit 2018   en battre arbetsdag for de manga manniskorna
Watson kista summit 2018 en battre arbetsdag for de manga manniskorna
IBM Sverige
Iwcs and cisco watson kista summit 2018 v2
Iwcs and cisco   watson kista summit 2018 v2Iwcs and cisco   watson kista summit 2018 v2
Iwcs and cisco watson kista summit 2018 v2
IBM Sverige
Ibm intro (watson summit) bkacke
Ibm intro (watson summit) bkackeIbm intro (watson summit) bkacke
Ibm intro (watson summit) bkacke
IBM Sverige
Acoustic io t rail monitoring.pptx
Acoustic io t rail monitoring.pptxAcoustic io t rail monitoring.pptx
Acoustic io t rail monitoring.pptx
IBM Sverige
Trender, inspirationer och visioner - Mikael Haglund #ibmbpsse18
Trender, inspirationer och visioner - Mikael Haglund #ibmbpsse18Trender, inspirationer och visioner - Mikael Haglund #ibmbpsse18
Trender, inspirationer och visioner - Mikael Haglund #ibmbpsse18
IBM Sverige
AI hur l奪ngt har vi kommit? Oskar Malmstr旦m, IBM #ibmbpsse18
AI  hur l奪ngt har vi kommit?  Oskar Malmstr旦m, IBM #ibmbpsse18AI  hur l奪ngt har vi kommit?  Oskar Malmstr旦m, IBM #ibmbpsse18
AI hur l奪ngt har vi kommit? Oskar Malmstr旦m, IBM #ibmbpsse18
IBM Sverige
#ibmbpsse18 - The journey to AI - Mikko H旦rkk旦, Elinar
#ibmbpsse18 - The journey to AI - Mikko H旦rkk旦, Elinar#ibmbpsse18 - The journey to AI - Mikko H旦rkk旦, Elinar
#ibmbpsse18 - The journey to AI - Mikko H旦rkk旦, Elinar
IBM Sverige
#ibmbpsse18 - Koppla s辰kert & redundant till IBM Cloud - Magnus Huss, Interexion
#ibmbpsse18 - Koppla s辰kert & redundant till IBM Cloud - Magnus Huss, Interexion#ibmbpsse18 - Koppla s辰kert & redundant till IBM Cloud - Magnus Huss, Interexion
#ibmbpsse18 - Koppla s辰kert & redundant till IBM Cloud - Magnus Huss, Interexion
IBM Sverige
#ibmbpsse18 - Den svenska marknaden, Andreas Lundgren, CMO, IBM
#ibmbpsse18 - Den svenska marknaden, Andreas Lundgren, CMO, IBM#ibmbpsse18 - Den svenska marknaden, Andreas Lundgren, CMO, IBM
#ibmbpsse18 - Den svenska marknaden, Andreas Lundgren, CMO, IBM
IBM Sverige
Multiresursplanering - Karolinska Universitetssjukhuset
Multiresursplanering - Karolinska UniversitetssjukhusetMultiresursplanering - Karolinska Universitetssjukhuset
Multiresursplanering - Karolinska Universitetssjukhuset
IBM Sverige
Blockchain explored
Blockchain explored Blockchain explored
Blockchain explored
IBM Sverige
Blockchain architected
Blockchain architectedBlockchain architected
Blockchain architected
IBM Sverige
Blockchain explained
Blockchain explainedBlockchain explained
Blockchain explained
IBM Sverige
Grow smarter project kista watson summit 2018_tommy auoja-1
Grow smarter project  kista watson summit 2018_tommy auoja-1Grow smarter project  kista watson summit 2018_tommy auoja-1
Grow smarter project kista watson summit 2018_tommy auoja-1
IBM Sverige
Bemanningsplanering axfood och houston final
Bemanningsplanering axfood och houston finalBemanningsplanering axfood och houston final
Bemanningsplanering axfood och houston final
IBM Sverige
Power ai nordics dcm
Power ai nordics dcmPower ai nordics dcm
Power ai nordics dcm
IBM Sverige
Nvidia and ibm presentation feb18
Nvidia and ibm presentation feb18Nvidia and ibm presentation feb18
Nvidia and ibm presentation feb18
IBM Sverige
Hwx introduction to_ibm_ai
Hwx introduction to_ibm_aiHwx introduction to_ibm_ai
Hwx introduction to_ibm_ai
IBM Sverige
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1
IBM Sverige
Watson kista summit 2018 box
Watson kista summit 2018 box Watson kista summit 2018 box
Watson kista summit 2018 box
IBM Sverige
Watson kista summit 2018 en battre arbetsdag for de manga manniskorna
Watson kista summit 2018   en battre arbetsdag for de manga manniskornaWatson kista summit 2018   en battre arbetsdag for de manga manniskorna
Watson kista summit 2018 en battre arbetsdag for de manga manniskorna
IBM Sverige
Iwcs and cisco watson kista summit 2018 v2
Iwcs and cisco   watson kista summit 2018 v2Iwcs and cisco   watson kista summit 2018 v2
Iwcs and cisco watson kista summit 2018 v2
IBM Sverige
Ibm intro (watson summit) bkacke
Ibm intro (watson summit) bkackeIbm intro (watson summit) bkacke
Ibm intro (watson summit) bkacke
IBM Sverige
Acoustic io t rail monitoring.pptx
Acoustic io t rail monitoring.pptxAcoustic io t rail monitoring.pptx
Acoustic io t rail monitoring.pptx
IBM Sverige

Recently uploaded (20)

Forecasting in AWS - 2025-01-25
Forecasting in AWS - 2025-01-25Forecasting in AWS - 2025-01-25
Forecasting in AWS - 2025-01-25
Alessandra Bilardi
Mastering Data Science with Tutort Academy
Mastering Data Science with Tutort AcademyMastering Data Science with Tutort Academy
Mastering Data Science with Tutort Academy
yashikanigam1
FinanceGPT Labs Whitepaper - Risks of Large Quantitative Models in Financial ...
FinanceGPT Labs Whitepaper - Risks of Large Quantitative Models in Financial ...FinanceGPT Labs Whitepaper - Risks of Large Quantitative Models in Financial ...
FinanceGPT Labs Whitepaper - Risks of Large Quantitative Models in Financial ...
FinanceGPT Labs
PostGIS Workshop: a comprehensive tutorial.ppt
PostGIS Workshop: a comprehensive tutorial.pptPostGIS Workshop: a comprehensive tutorial.ppt
PostGIS Workshop: a comprehensive tutorial.ppt
LonJames2
GenAI-powered assistants compared in a real case - 2025-03-18
GenAI-powered assistants compared in a real case - 2025-03-18GenAI-powered assistants compared in a real case - 2025-03-18
GenAI-powered assistants compared in a real case - 2025-03-18
Alessandra Bilardi
buiding web based land registration buiding web based land registration and m...
buiding web based land registration buiding web based land registration and m...buiding web based land registration buiding web based land registration and m...
buiding web based land registration buiding web based land registration and m...
habtamudele9
Data Management on AWS: from caos to centralized governance - 2025-03-26
Data Management on AWS: from caos to centralized governance - 2025-03-26Data Management on AWS: from caos to centralized governance - 2025-03-26
Data Management on AWS: from caos to centralized governance - 2025-03-26
Alessandra Bilardi
Quantitative Presentation in Research Methods.pptx
Quantitative Presentation in Research Methods.pptxQuantitative Presentation in Research Methods.pptx
Quantitative Presentation in Research Methods.pptx
lenny lopez
Agile Infinity: When the Customer Is an Abstract Concept
Agile Infinity: When the Customer Is an Abstract ConceptAgile Infinity: When the Customer Is an Abstract Concept
Agile Infinity: When the Customer Is an Abstract Concept
Loic Merckel
Dynamic-Data-Visualization-Dashboard.pptx
Dynamic-Data-Visualization-Dashboard.pptxDynamic-Data-Visualization-Dashboard.pptx
Dynamic-Data-Visualization-Dashboard.pptx
bammidigovinda108
OPPOTUS - Malaysias on Malaysia 4Q2024.pdf
OPPOTUS - Malaysias on Malaysia 4Q2024.pdfOPPOTUS - Malaysias on Malaysia 4Q2024.pdf
OPPOTUS - Malaysias on Malaysia 4Q2024.pdf
Oppotus
INSIGHTS INTO STABILITY ASPECTS OF HYBRID SYSTEM; AN ENABLING TECHNOLOGY FOR ...
INSIGHTS INTO STABILITY ASPECTS OF HYBRID SYSTEM; AN ENABLING TECHNOLOGY FOR ...INSIGHTS INTO STABILITY ASPECTS OF HYBRID SYSTEM; AN ENABLING TECHNOLOGY FOR ...
INSIGHTS INTO STABILITY ASPECTS OF HYBRID SYSTEM; AN ENABLING TECHNOLOGY FOR ...
hugoshan513
PLAN_OF_WORK_PPT_BY_ROHIT_BHAIRAM_--2212020201003[1] new.pptx
PLAN_OF_WORK_PPT_BY_ROHIT_BHAIRAM_--2212020201003[1] new.pptxPLAN_OF_WORK_PPT_BY_ROHIT_BHAIRAM_--2212020201003[1] new.pptx
PLAN_OF_WORK_PPT_BY_ROHIT_BHAIRAM_--2212020201003[1] new.pptx
bhairamrohit948
Sources of Data and Data collection methods.pptx
Sources of Data and Data collection methods.pptxSources of Data and Data collection methods.pptx
Sources of Data and Data collection methods.pptx
denniskhisa
networkmonitoringtools-200615094423.pptx
networkmonitoringtools-200615094423.pptxnetworkmonitoringtools-200615094423.pptx
networkmonitoringtools-200615094423.pptx
kelvinzallan5
KISHAN GAMINjwjshjxjwjhskwkdjehjshds.pptx
KISHAN GAMINjwjshjxjwjhskwkdjehjshds.pptxKISHAN GAMINjwjshjxjwjhskwkdjehjshds.pptx
KISHAN GAMINjwjshjxjwjhskwkdjehjshds.pptx
maheshbochare
sterategicinformationsystem-250329162230-1990dc92.pptx
sterategicinformationsystem-250329162230-1990dc92.pptxsterategicinformationsystem-250329162230-1990dc92.pptx
sterategicinformationsystem-250329162230-1990dc92.pptx
EliasHaile7
Quantitative Presentation_Final.....pptx
Quantitative Presentation_Final.....pptxQuantitative Presentation_Final.....pptx
Quantitative Presentation_Final.....pptx
lenny lopez
STS-PRELIM-2025.pptxtyyfddjugggfssghghihf
STS-PRELIM-2025.pptxtyyfddjugggfssghghihfSTS-PRELIM-2025.pptxtyyfddjugggfssghghihf
STS-PRELIM-2025.pptxtyyfddjugggfssghghihf
TristanEvasco
AI system mimicking human expert decision-making..pptx
AI system mimicking human expert decision-making..pptxAI system mimicking human expert decision-making..pptx
AI system mimicking human expert decision-making..pptx
ritikacompscience
Forecasting in AWS - 2025-01-25
Forecasting in AWS - 2025-01-25Forecasting in AWS - 2025-01-25
Forecasting in AWS - 2025-01-25
Alessandra Bilardi
Mastering Data Science with Tutort Academy
Mastering Data Science with Tutort AcademyMastering Data Science with Tutort Academy
Mastering Data Science with Tutort Academy
yashikanigam1
FinanceGPT Labs Whitepaper - Risks of Large Quantitative Models in Financial ...
FinanceGPT Labs Whitepaper - Risks of Large Quantitative Models in Financial ...FinanceGPT Labs Whitepaper - Risks of Large Quantitative Models in Financial ...
FinanceGPT Labs Whitepaper - Risks of Large Quantitative Models in Financial ...
FinanceGPT Labs
PostGIS Workshop: a comprehensive tutorial.ppt
PostGIS Workshop: a comprehensive tutorial.pptPostGIS Workshop: a comprehensive tutorial.ppt
PostGIS Workshop: a comprehensive tutorial.ppt
LonJames2
GenAI-powered assistants compared in a real case - 2025-03-18
GenAI-powered assistants compared in a real case - 2025-03-18GenAI-powered assistants compared in a real case - 2025-03-18
GenAI-powered assistants compared in a real case - 2025-03-18
Alessandra Bilardi
buiding web based land registration buiding web based land registration and m...
buiding web based land registration buiding web based land registration and m...buiding web based land registration buiding web based land registration and m...
buiding web based land registration buiding web based land registration and m...
habtamudele9
Data Management on AWS: from caos to centralized governance - 2025-03-26
Data Management on AWS: from caos to centralized governance - 2025-03-26Data Management on AWS: from caos to centralized governance - 2025-03-26
Data Management on AWS: from caos to centralized governance - 2025-03-26
Alessandra Bilardi
Quantitative Presentation in Research Methods.pptx
Quantitative Presentation in Research Methods.pptxQuantitative Presentation in Research Methods.pptx
Quantitative Presentation in Research Methods.pptx
lenny lopez
Agile Infinity: When the Customer Is an Abstract Concept
Agile Infinity: When the Customer Is an Abstract ConceptAgile Infinity: When the Customer Is an Abstract Concept
Agile Infinity: When the Customer Is an Abstract Concept
Loic Merckel
Dynamic-Data-Visualization-Dashboard.pptx
Dynamic-Data-Visualization-Dashboard.pptxDynamic-Data-Visualization-Dashboard.pptx
Dynamic-Data-Visualization-Dashboard.pptx
bammidigovinda108
OPPOTUS - Malaysias on Malaysia 4Q2024.pdf
OPPOTUS - Malaysias on Malaysia 4Q2024.pdfOPPOTUS - Malaysias on Malaysia 4Q2024.pdf
OPPOTUS - Malaysias on Malaysia 4Q2024.pdf
Oppotus
INSIGHTS INTO STABILITY ASPECTS OF HYBRID SYSTEM; AN ENABLING TECHNOLOGY FOR ...
INSIGHTS INTO STABILITY ASPECTS OF HYBRID SYSTEM; AN ENABLING TECHNOLOGY FOR ...INSIGHTS INTO STABILITY ASPECTS OF HYBRID SYSTEM; AN ENABLING TECHNOLOGY FOR ...
INSIGHTS INTO STABILITY ASPECTS OF HYBRID SYSTEM; AN ENABLING TECHNOLOGY FOR ...
hugoshan513
PLAN_OF_WORK_PPT_BY_ROHIT_BHAIRAM_--2212020201003[1] new.pptx
PLAN_OF_WORK_PPT_BY_ROHIT_BHAIRAM_--2212020201003[1] new.pptxPLAN_OF_WORK_PPT_BY_ROHIT_BHAIRAM_--2212020201003[1] new.pptx
PLAN_OF_WORK_PPT_BY_ROHIT_BHAIRAM_--2212020201003[1] new.pptx
bhairamrohit948
Sources of Data and Data collection methods.pptx
Sources of Data and Data collection methods.pptxSources of Data and Data collection methods.pptx
Sources of Data and Data collection methods.pptx
denniskhisa
networkmonitoringtools-200615094423.pptx
networkmonitoringtools-200615094423.pptxnetworkmonitoringtools-200615094423.pptx
networkmonitoringtools-200615094423.pptx
kelvinzallan5
KISHAN GAMINjwjshjxjwjhskwkdjehjshds.pptx
KISHAN GAMINjwjshjxjwjhskwkdjehjshds.pptxKISHAN GAMINjwjshjxjwjhskwkdjehjshds.pptx
KISHAN GAMINjwjshjxjwjhskwkdjehjshds.pptx
maheshbochare
sterategicinformationsystem-250329162230-1990dc92.pptx
sterategicinformationsystem-250329162230-1990dc92.pptxsterategicinformationsystem-250329162230-1990dc92.pptx
sterategicinformationsystem-250329162230-1990dc92.pptx
EliasHaile7
Quantitative Presentation_Final.....pptx
Quantitative Presentation_Final.....pptxQuantitative Presentation_Final.....pptx
Quantitative Presentation_Final.....pptx
lenny lopez
STS-PRELIM-2025.pptxtyyfddjugggfssghghihf
STS-PRELIM-2025.pptxtyyfddjugggfssghghihfSTS-PRELIM-2025.pptxtyyfddjugggfssghghihf
STS-PRELIM-2025.pptxtyyfddjugggfssghghihf
TristanEvasco
AI system mimicking human expert decision-making..pptx
AI system mimicking human expert decision-making..pptxAI system mimicking human expert decision-making..pptx
AI system mimicking human expert decision-making..pptx
ritikacompscience

Solving Challenges With 'Huge Data'

  • 1. Solving Challenges with 'Huge Data' Solutions & client cases Dr. Axel Koester - axel.koester@de.ibm.com Chief Technologist EMEA Storage Competence Center Chairman of TEC think tank D/A/CH
  • 2. 2 Three ways how IT uses data today Procedural (ifthen) Image: Business over Broadway Statistical (big data) Machine learning Image: opendatascience.com
  • 3. 3 and in 10 years Procedural (ifthen) Image: Business over Broadway Statistical (big data) Machine learning Image: opendatascience.com
  • 4. 4 Current examples Image: Business over Broadway Image: opendatascience.com shopping, profiling, fraud detection autonomous driving, image classification, chatbots, gaming Manual modelling Accumulation of examples Automatic modelling business as usual classic / legacy IT
  • 6. 6 Train-on-the-job by reviewing low-confidence cases MUCH CHEAPER THAN RE-CODING AT EVERY PROD CHANGE
  • 7. 7 Procedural: Archive test cases for auditing Statistical: Parallel processing of many stored samples Machine Learning: Train sample data, then archive or trade data Image: Business over Broadway Image: opendatascience.com How is data stored? ifthenelse GB/s 1 2 GB/s 3 parallel search
  • 8. 10 Imperatives for data storage: implement workflows avoid "data tourism" scale without effort
  • 9. 11 DESY: Example for a solved "data tourism" problem
  • 10. 12 DESY data: Synchrotron X-ray imaging
  • 11. 13 Data tourism Lambda: 60 Gb/s @ 2000 Hz Eiger: 30 Gb/s @ 2000 Hz 2000files/s/cam Webportalaccess IBM Spectrum Scale + Workflow rules 3D reconstruction, research calculus 2000 files/s/cam MQ cluster lifecycle cluster
  • 12. 14 [Next-gen storage] Prototype wrote 50k Files/sec in one folder* -- started at 02/28/2017 12:13:13 -- mdtest-1.9.3 was launched with 14 total task(s) on 14 node(s) Command line used: /ghome/oehmes/mpi/bin/mdtest-pcmpi9131-existingdir -d /gpfs/fs2- 1m-me1/shared/mdtest-ec -i 1 -n 35000 -F -w 0 -Z -p 8 Path: /gpfs/fs2-1m-me1/shared FS: 17.1 TiB Used FS: 0.1% Inodes: 476.8 Mi Used Inodes: 0.1% 14 tasks, 490000 files SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 50032.690 50032.690 50032.690 0.000 File stat : 3937604.341 3937604.341 3937604.341 0.000 File read : 941193.073 941193.073 941193.073 0.000 File removal : 143095.519 143095.519 143095.519 0.000 Tree creation : 77672.296 77672.296 77672.296 0.000 Tree removal : 0.239 0.239 0.239 0.000 -- finished at 02/28/2017 12:13:39 -- (*) in independent folders, the test cluster could write 2,6 Mio 32k files/sec
  • 14. 16 Newly acquired evidence data: Automatic generation of an immutable copy before the investigation Life cycle management adjusted to investigation requirements Life cycle management of the immutable copy fully automated (according to law) Workflow Automation: Preserving crime evidence data Workflow included + Immutability included
  • 15. 17 Heavily used in broadcasting, but also for: CCTV (highlighting, automatic archiving & deletion) Medical tomography scans Fingerprint processing (association, feature extraction, distribution) Legal rich media document processing Workflow Automation: Handling connected documents IBM AREMAArchive and Essence Manager and many more used by
  • 16. 18 The mother of all data projects Square Kilometre Array (SKA)
  • 17. 19 Radio Interferometry data capture: Square Kilometre Array (SKA) will be the worlds largest radio telescope 牟 900 stations 牟 300 antennas / station 牟 begin of construction planned in 2018 Substantial technological challenges 牟 160 terabytes of raw data collected per second 牟 1 petabyte of data stored per day 牟 1000 petaflops per second processing power IBM's R&D involvement since 2012 牟 Research collaboration with Astron (Dutch Institute of Radio Astronomy) 牟 Storage aspects 牟 ExaPlan: planning tool for multi-tiered exascale storage 牟 Tape library modeling and simulation 牟 Predictive cachingArtists rendering of the SKA
  • 18. 20 For everyone else: Build your private cloud foundation
  • 19. 21 S3-compatible Private Cloud as "everybody's offload storage" driven by public cloud pricing, reducing cost by enhancing storage footprint efficiency Organization-wide S3-compatible repository IBM Cloud Object Storage x86 image (contains OS) Offload snapshots Offload stale volumes IBM Spectrum Virtualize IBM Spectrum Scale Multi-vendor block storage IBM file clusters (NAS) SMB/CIFS NFS POSIX HDFS Disk TapeFlash Offload old files Offload snapshots Cloud backup IBM Spectrum Protect IBM backup Cloud backup Cloud-2-Cloud migration Systems VMs Users Archive SEC-legal retention mode + deletion hold per object$$ available as appliances
  • 20. 22 All-or-Nothing-Transform (AONT) for safety, reliability and security 5 nines write availability, 6 nines read availability, 15+ nines reliability against data loss (3 sites) IBM Cloud Object Storage x86 image (contains OS) Geographical Information Dispersal Algorithm E.g. "encode data in 12 slices, needs 7 slices for decoding" JBOD undecipherable $$ JBOD
  • 21. 23 How Sky avoids bottlenecks, service outages and hacking Object access is lightweight & secure, resulting in low CPU footprint & cost browser obtains object ID (movie)
  • 23. 25 AI learns to predict ideal storage based on meta-information G. Cherubini, J. Jelitto, V. Venkatesan, Cognitive Storage for Big Data, Computer, April 2016
  • 24. 26 Data Life Cycle Prediction based on experience Life cycles of different data types Prediction Quality 10% Training: 95% Success worst case (low predictable data class)
  • 25. 27 Data Prioritization Prediction after Blackout recovery Recovery relevance (Synchronous? Consistent? Expendable?) Prediction Quality important transactions, no loss tolerated Temp Data t R
  • 27. 29 Quantum Computer: Nobody needs one at home Ken Olsen, founder of Digital Equipment Corporation, 1977
  • 28. 30
  • 29. 31 IBM Quantum Computing Scientists Hanhee Paik (left) and Sarah Sheldon (right)
  • 30. 32
  • 31. 33
  • 32. 34
  • 33. 37
  • 35. 39 Quantum Computer: Nobody needs one at home Search for IBM quantum experience https://quantumexperience.ng.bluemix.net/qx