際際滷shows by User: aelshimi / http://www.slideshare.net/images/logo.gif 際際滷shows by User: aelshimi / Fri, 31 Mar 2017 20:26:15 GMT 際際滷Share feed for 際際滷shows by User: aelshimi Predicting Storage Failures with Machine Learning /slideshow/predicting-storage-failures-with-machine-learning/74099857 lf-vault-2017-aelshimi-170331202616
My talk from VAULT 2017 - Linux Storage and File Systems Conference http://events.linuxfoundation.org/events/vault/program/slides Predicting Storage Failures with Machine Learning, Ahmed El-Shimi Abstract: Disk drives fail at an average annual rate of ~2%. Any system with Availability and Durability requirements must mitigate for such failures through a redundancy technique such as RAID, Erasure Coding, Replication or Backup. With the wealth of monitoring data available nowadays and the ability to process the data in near real-time, can we predict such failures? How well can we do it? And how would that impact how we design and operate large distributed systems? We examine and motivate predictive failure detection in the context of Availability, Rebuild Times and Recovery Objectives of large systems. We then train and evaluate multiple models achieving favorable accuracy (97.5%) to common datacenter practices. We demonstrate how we can tune our learners to achieve different Precision and Recall objectives thus improving Availability, Protection or Operational Efficiency. Speaker: Ahmed El-Shimi has worked in Storage, Distributed Systems, and Cloud for over 15 years. He built technologies such as Deduplication, Automated Tiering, Hybrid Cloud Storage and Data Awareness. He is currently Co-Founder of Minima Inc. a Cloud Data Governance Startup. Prior he led Product for Microsoft's StorSimple Appliance and worked at Microsoft Research and on products such as Microsoft Azure and Windows Server. Ahmed has spoken at LinuxCon, SNIA Storage and Networking World, LOPSA and Microsoft Build. His work has been published at prestigious conferences such as USENIX ATC.]]>

My talk from VAULT 2017 - Linux Storage and File Systems Conference http://events.linuxfoundation.org/events/vault/program/slides Predicting Storage Failures with Machine Learning, Ahmed El-Shimi Abstract: Disk drives fail at an average annual rate of ~2%. Any system with Availability and Durability requirements must mitigate for such failures through a redundancy technique such as RAID, Erasure Coding, Replication or Backup. With the wealth of monitoring data available nowadays and the ability to process the data in near real-time, can we predict such failures? How well can we do it? And how would that impact how we design and operate large distributed systems? We examine and motivate predictive failure detection in the context of Availability, Rebuild Times and Recovery Objectives of large systems. We then train and evaluate multiple models achieving favorable accuracy (97.5%) to common datacenter practices. We demonstrate how we can tune our learners to achieve different Precision and Recall objectives thus improving Availability, Protection or Operational Efficiency. Speaker: Ahmed El-Shimi has worked in Storage, Distributed Systems, and Cloud for over 15 years. He built technologies such as Deduplication, Automated Tiering, Hybrid Cloud Storage and Data Awareness. He is currently Co-Founder of Minima Inc. a Cloud Data Governance Startup. Prior he led Product for Microsoft's StorSimple Appliance and worked at Microsoft Research and on products such as Microsoft Azure and Windows Server. Ahmed has spoken at LinuxCon, SNIA Storage and Networking World, LOPSA and Microsoft Build. His work has been published at prestigious conferences such as USENIX ATC.]]>
Fri, 31 Mar 2017 20:26:15 GMT /slideshow/predicting-storage-failures-with-machine-learning/74099857 aelshimi@slideshare.net(aelshimi) Predicting Storage Failures with Machine Learning aelshimi My talk from VAULT 2017 - Linux Storage and File Systems Conference http://events.linuxfoundation.org/events/vault/program/slides Predicting Storage Failures with Machine Learning, Ahmed El-Shimi Abstract: Disk drives fail at an average annual rate of ~2%. Any system with Availability and Durability requirements must mitigate for such failures through a redundancy technique such as RAID, Erasure Coding, Replication or Backup. With the wealth of monitoring data available nowadays and the ability to process the data in near real-time, can we predict such failures? How well can we do it? And how would that impact how we design and operate large distributed systems? We examine and motivate predictive failure detection in the context of Availability, Rebuild Times and Recovery Objectives of large systems. We then train and evaluate multiple models achieving favorable accuracy (97.5%) to common datacenter practices. We demonstrate how we can tune our learners to achieve different Precision and Recall objectives thus improving Availability, Protection or Operational Efficiency. Speaker: Ahmed El-Shimi has worked in Storage, Distributed Systems, and Cloud for over 15 years. He built technologies such as Deduplication, Automated Tiering, Hybrid Cloud Storage and Data Awareness. He is currently Co-Founder of Minima Inc. a Cloud Data Governance Startup. Prior he led Product for Microsoft's StorSimple Appliance and worked at Microsoft Research and on products such as Microsoft Azure and Windows Server. Ahmed has spoken at LinuxCon, SNIA Storage and Networking World, LOPSA and Microsoft Build. His work has been published at prestigious conferences such as USENIX ATC. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/lf-vault-2017-aelshimi-170331202616-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> My talk from VAULT 2017 - Linux Storage and File Systems Conference http://events.linuxfoundation.org/events/vault/program/slides Predicting Storage Failures with Machine Learning, Ahmed El-Shimi Abstract: Disk drives fail at an average annual rate of ~2%. Any system with Availability and Durability requirements must mitigate for such failures through a redundancy technique such as RAID, Erasure Coding, Replication or Backup. With the wealth of monitoring data available nowadays and the ability to process the data in near real-time, can we predict such failures? How well can we do it? And how would that impact how we design and operate large distributed systems? We examine and motivate predictive failure detection in the context of Availability, Rebuild Times and Recovery Objectives of large systems. We then train and evaluate multiple models achieving favorable accuracy (97.5%) to common datacenter practices. We demonstrate how we can tune our learners to achieve different Precision and Recall objectives thus improving Availability, Protection or Operational Efficiency. Speaker: Ahmed El-Shimi has worked in Storage, Distributed Systems, and Cloud for over 15 years. He built technologies such as Deduplication, Automated Tiering, Hybrid Cloud Storage and Data Awareness. He is currently Co-Founder of Minima Inc. a Cloud Data Governance Startup. Prior he led Product for Microsoft&#39;s StorSimple Appliance and worked at Microsoft Research and on products such as Microsoft Azure and Windows Server. Ahmed has spoken at LinuxCon, SNIA Storage and Networking World, LOPSA and Microsoft Build. His work has been published at prestigious conferences such as USENIX ATC.
Predicting Storage Failures with Machine Learning from Ahmed El-Shimi
]]>
240 2 https://cdn.slidesharecdn.com/ss_thumbnails/lf-vault-2017-aelshimi-170331202616-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-aelshimi-48x48.jpg?cb=1523494674 Engineering and Product leader with experience in defining and bringing multiple high-growth SaaS, Infrastructure, and Storage products to market. Startup Founder | Ex-Microsoft | Ex-Microsoft Research | BU Eng Grad Extensive global experience: Seattle | Bay Area | Europe | Africa & Middle East Proven track record at hiring and growing all-star Product and Engineering teams. Publishing and Speaking record at conferences such as USENIX ATC, LinuxCon, Linux Foundation Vault, SNIA SNW, Microsoft Build. Leading role in 2 Microsoft acquisitions including leading the technical and product strategy and due diligence team, venture integration and post acquisition planning and product release.