
際際滷Share a Scribd company logo
Cognitive Computing in LSF
Predicting Job Resource Usage
Spectrum LSF Development
What is cognitive computing?
 IBM defines cognitive computing as
 Systems that learn at scale, reason with purpose and
interact with humans naturally
 Rather than being explicitly programmed, they learn and
reason from their interactions with us and from their
experiences with their environment
 Cognitive computing is becoming popular not only in
traditional fields (e.g. computer vision, AI, image
searching), but also in more general fields such as EDA
 Wafer yield analysis
 Congestion prediction in Place and Route
 Can we apply these techniques to make LSF smarter or
reduce user error?
When LSF meets Cognitive Era 
 LSF produces huge amount of data which is not
only for the purpose of problem diagnosis or
restarting cluster
 Historical job records contain job resource
requirements and resource consuming
 Cluster performance monitoring data and system
configuration changing records
 Cognitive technologies could make LSF smarter:
 Break the obstacles of mapping user domain
knowledge to resource requirements in LSF
 Automatically tune LSF cluster performance by
learning the best parameter configurations
 Intelligently predict job resource usage:
 Memory Usage
How much memory
does my job need?
Tell me other
information about
your job! I will
figure out your
Predictor Overview
Preliminary prediction verification using LSF customer data
 Prediction targets
 Job maximum memory usage: the maximum memory consumed by a job
during its lifecycle in LSF
 Job runtime: the total running time of a job in LSF
 Prediction algorithms
 Machine Learning algorithms
 k-nearest neighbors algorithm (k-NN): find k-th nearest job records
to calculate the value for the job to be predicted
 Support Vector Machine (SVM): small sample learning to avoid
high-dimension disaster
 Deep learning networks using MXNET and Caffe
 Build the model by choosing proper hyper-parameters (e.g. number
of layers, neutrons)
 Prediction method
 Use classification model to predict the range of maximum memory usage
 Use regression model to predict the continuous value of job runtimes
LSF Job events
Feature extraction
Training processed
job features
Prediction using
trained models
Job Memory Prediction
Green: Average deviation ratio of user specified value
Blue: Average deviation ratio of predicted value
 For this clients data set, the users are significantly
over reserving memory for small jobs.
 In this case, the prediction is more accurate than
the user specified values but there are still errors.
 This means we could potentially run more jobs on
these nodes.
Job Runtime Prediction  For this clients data set, the users
significantly over specify the expected
runtime for a job (probably just accepting
scripted/queue defaults).
 This prediction gives very good results,
but again, there are still some errors.
 This means we could potentially give
fairly accurate predictions on turnaround
time for a set of jobs and/or better backfill
Selecting the Model
 There are many different Deep Learning frameworks MXNET,
Caffe, Tensor Flow, Torch etc
 Selecting the wrong model will give no useful results.
 You also need the right hyper parameters to get good results.
 A poor choice of parameters will give a sub-optimal result.
 Selection of the right model and parameters takes time and
 Within Spectrum Computing we have a related project to help
with automating the model and hyper-parameter selection and
training of the model
NOT convergent!!
Convergent but
not very good
Open Discussion
 The prototype has shown promising results with sample client data.
 Were looking for your feedback and have packaged the prototype as a VM for you to try.
 It can be used in a passive or active mode for memory and runtime.
 What kinds of job resource requirements are difficult for end users to specify for their
 What scenarios can use the prediction data in your cluster? Can the prediction errors be
 Are there any other scenarios might utilize the predictions provided by cognitive computing
If you are interested in the prototype: LSF Predictor VM
1. Configure and start VM image
2. Install data collect in LSF cluster to ingest historical data to predictor for model training
 (Passive mode) Run command to predict some of historical jobs, and open web browser to view reports for
the prediction accuracy evaluation
 (Active mode) Deploy esub script to LSF, and use the esub to replace user specified mem with predicted one
for new submitted jobs
Thank you.
Job runtime prediction (Contd)
Customer 2:
30k jobs for training model
2k jobs for prediction verification
Customer 3:
20k jobs for training model
2k jobs for prediction verification
Both two customers do not use job-level runtime
limits feature in LSF
Predict maximum memory usage of LSF jobs
Copyright 息 2016 by International Business Machines Corporation. All rights reserved.
No part of this document may be reproduced or transmitted in any form without written permission from IBM Corporation.
Product data has been reviewed for accuracy as of the date of initial publication. Product data is subject to change without notice. This document could include technical inaccuracies or
typographical errors. IBM may make improvements and/or changes in the product(s) and/or program(s) described herein at any time without notice. Any statements regarding IBM's future
direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. References in this document to IBM products, programs, or services does
not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Any reference to an IBM Program Product in this
document is not intended to state or imply that only that program product may be used. Any functionally equivalent program, that does not infringe IBM's intellectually property rights, may
be used instead.
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT. IBM shall have no responsibility to update this information. IBM products are warranted, if at
all, according to the terms and conditions of the agreements (e.g., IBM Customer Agreement, Statement of Limited Warranty, International Program License Agreement, etc.) under which
they are provided. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM
has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. IBM
makes no representations or warranties, ed or implied, regarding non-IBM products and services.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents or copyrights. Inquiries regarding patent or copyright
licenses should be made, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 1 0504- 785
Legal notices
Information and trademarks
IBM, the IBM logo, ibm.com, IBM System Storage, IBM Spectrum Storage, IBM Spectrum Control, IBM Spectrum Protect, IBM Spectrum Archive, IBM Spectrum Virtualize, IBM Spectrum Scale, IBM Spectrum Accelerate, Softlayer, and XIV are
trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at http://www.ibm.com/legal/copytrade.shtml
The following are trademarks or registered trademarks of other companies.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
IT Infrastructure Library is a Registered Trade Mark of AXELOS Limited.
Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.
ITIL is a Registered Trade Mark of AXELOS Limited.
UNIX is a registered trademark of The Open Group in the United States and other countries.
* All other products may be trademarks or registered trademarks of their respective companies.
Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations
such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements
equivalent to the performance ratios stated here.
All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance
characteristics will vary depending on individual customer configurations and conditions.
This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business
contact for information on the product or services available in your area.
All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. |
Special notices
This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in other countries, and the information is
subject to change without notice. Consult your local IBM business contact for information on the IBM offerings available in your area.
Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on the capabilities of non-IBM products
should be addressed to the suppliers of those products.
IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. Send
license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504-1785 USA.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees either expressed or implied.
All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the results that may be achieved. Actual
environmental costs and performance characteristics will vary depending on individual client configurations and conditions.
IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries and divisions worldwide to qualified commercial and
government clients. Rates are based on a client's credit rating, financing terms, offering type, equipment type and options, and may vary by country. Other restrictions may apply. Rates
and offerings are subject to change, extension or withdrawal without notice.
IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies.
All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary.
IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are dependent on many factors including system
hardware configuration and software design and configuration. Some measurements quoted in this document may have been made on development-level systems. There is no
guarantee these measurements will be the same on generally-available systems. Some measurements quoted in this document may have been estimated through extrapolation. Users
of this document should verify the applicable data for their specific environment.

More Related Content


  • 1. Cognitive Computing in LSF Predicting Job Resource Usage Spectrum LSF Development
  • 2. IBM Systems What is cognitive computing? IBM defines cognitive computing as Systems that learn at scale, reason with purpose and interact with humans naturally Rather than being explicitly programmed, they learn and reason from their interactions with us and from their experiences with their environment Cognitive computing is becoming popular not only in traditional fields (e.g. computer vision, AI, image searching), but also in more general fields such as EDA Wafer yield analysis Congestion prediction in Place and Route Can we apply these techniques to make LSF smarter or reduce user error?
  • 3. IBM Systems When LSF meets Cognitive Era LSF produces huge amount of data which is not only for the purpose of problem diagnosis or restarting cluster Historical job records contain job resource requirements and resource consuming information Cluster performance monitoring data and system configuration changing records Cognitive technologies could make LSF smarter: Break the obstacles of mapping user domain knowledge to resource requirements in LSF Automatically tune LSF cluster performance by learning the best parameter configurations Intelligently predict job resource usage: Memory Usage Runtime How much memory does my job need? LSF Tell me other information about your job! I will figure out your memory requirements!
  • 5. IBM Systems Preliminary prediction verification using LSF customer data Prediction targets Job maximum memory usage: the maximum memory consumed by a job during its lifecycle in LSF Job runtime: the total running time of a job in LSF Prediction algorithms Machine Learning algorithms k-nearest neighbors algorithm (k-NN): find k-th nearest job records to calculate the value for the job to be predicted Support Vector Machine (SVM): small sample learning to avoid high-dimension disaster Deep learning networks using MXNET and Caffe Build the model by choosing proper hyper-parameters (e.g. number of layers, neutrons) Prediction method Use classification model to predict the range of maximum memory usage Use regression model to predict the continuous value of job runtimes LSF Job events collection Feature extraction Training processed job features Prediction using trained models
  • 6. IBM Systems Job Memory Prediction Green: Average deviation ratio of user specified value Blue: Average deviation ratio of predicted value DeviationRatio For this clients data set, the users are significantly over reserving memory for small jobs. In this case, the prediction is more accurate than the user specified values but there are still errors. This means we could potentially run more jobs on these nodes.
  • 7. IBM Systems Job Runtime Prediction For this clients data set, the users significantly over specify the expected runtime for a job (probably just accepting scripted/queue defaults). This prediction gives very good results, but again, there are still some errors. This means we could potentially give fairly accurate predictions on turnaround time for a set of jobs and/or better backfill scheduling.
  • 8. IBM Systems Selecting the Model There are many different Deep Learning frameworks MXNET, Caffe, Tensor Flow, Torch etc Selecting the wrong model will give no useful results. You also need the right hyper parameters to get good results. A poor choice of parameters will give a sub-optimal result. Selection of the right model and parameters takes time and effort. Within Spectrum Computing we have a related project to help with automating the model and hyper-parameter selection and training of the model Caffe MXNET NOT convergent!! Convergent but not very good Train Test
  • 9. IBM Systems Open Discussion The prototype has shown promising results with sample client data. Were looking for your feedback and have packaged the prototype as a VM for you to try. It can be used in a passive or active mode for memory and runtime. What kinds of job resource requirements are difficult for end users to specify for their jobs? What scenarios can use the prediction data in your cluster? Can the prediction errors be tolerated? Are there any other scenarios might utilize the predictions provided by cognitive computing approaches?
  • 10. IBM Systems If you are interested in the prototype: LSF Predictor VM 1. Configure and start VM image 2. Install data collect in LSF cluster to ingest historical data to predictor for model training (Passive mode) Run command to predict some of historical jobs, and open web browser to view reports for the prediction accuracy evaluation (Active mode) Deploy esub script to LSF, and use the esub to replace user specified mem with predicted one for new submitted jobs Browse r
  • 13. IBM Systems Job runtime prediction (Contd) Customer 2: 30k jobs for training model 2k jobs for prediction verification Customer 3: 20k jobs for training model 2k jobs for prediction verification Both two customers do not use job-level runtime limits feature in LSF
  • 15. IBM Systems Copyright 息 2016 by International Business Machines Corporation. All rights reserved. No part of this document may be reproduced or transmitted in any form without written permission from IBM Corporation. Product data has been reviewed for accuracy as of the date of initial publication. Product data is subject to change without notice. This document could include technical inaccuracies or typographical errors. IBM may make improvements and/or changes in the product(s) and/or program(s) described herein at any time without notice. Any statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Any reference to an IBM Program Product in this document is not intended to state or imply that only that program product may be used. Any functionally equivalent program, that does not infringe IBM's intellectually property rights, may be used instead. THE INFORMATION PROVIDED IN THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER OR IMPLIED. IBM LY DISCLAIMS ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT. IBM shall have no responsibility to update this information. IBM products are warranted, if at all, according to the terms and conditions of the agreements (e.g., IBM Customer Agreement, Statement of Limited Warranty, International Program License Agreement, etc.) under which they are provided. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. IBM makes no representations or warranties, ed or implied, regarding non-IBM products and services. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents or copyrights. Inquiries regarding patent or copyright licenses should be made, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 1 0504- 785 U.S.A. Legal notices |
  • 16. IBM Systems Information and trademarks IBM, the IBM logo, ibm.com, IBM System Storage, IBM Spectrum Storage, IBM Spectrum Control, IBM Spectrum Protect, IBM Spectrum Archive, IBM Spectrum Virtualize, IBM Spectrum Scale, IBM Spectrum Accelerate, Softlayer, and XIV are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at http://www.ibm.com/legal/copytrade.shtml The following are trademarks or registered trademarks of other companies. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a Registered Trade Mark of AXELOS Limited. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. ITIL is a Registered Trade Mark of AXELOS Limited. UNIX is a registered trademark of The Open Group in the United States and other countries. * All other products may be trademarks or registered trademarks of their respective companies. Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. |
  • 17. IBM Systems Special notices This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in other countries, and the information is subject to change without notice. Consult your local IBM business contact for information on the IBM offerings available in your area. Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504-1785 USA. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees either expressed or implied. All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and conditions. IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type, equipment type and options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice. IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies. All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are dependent on many factors including system hardware configuration and software design and configuration. Some measurements quoted in this document may have been made on development-level systems. There is no guarantee these measurements will be the same on generally-available systems. Some measurements quoted in this document may have been estimated through extrapolation. Users of this document should verify the applicable data for their specific environment. |