Data Science on HPC platforms
Methods in Machine and Deep Learning are being investigated to either augment or replace traditional simulation. Contemporary trends in hardware and software have enabled convergence of HPC and AI. In this workshop we survey these trends and explore what is available on system level and in software to ease the transition to developing data science workloads. The talks include a walk through the current hardware and system resources available to accelerate the Data science workflows. It is followed by the cataloging of software tools and frameworks available to develop ML/DL models, accelerate their training process and handle the associated data requirements at scale. Also presented will be some examples where communities (e.g. CFD and Geoscience), who traditionally rely on classical HPC simulations, leverage Data Science methods to accelerate their scientific investigations. Attendees are assumed to have no prior experience with ML/DL workloads.
Sr. Computational Scientist Lead, KAUST
Saber Feki leads the computational and data science and engineering at the KAUST Supercomputing Core Laboratory, providing support, training, advanced services and research collaborations with users of the leadership supercomputer Shaheen II Cray XC40 and a heterogeneous cluster “Ibex” with over 600 GPUs.
Saber is passionate about technology, and enjoys working with users and technology vendors to plan and execute refreshes to KAUST HPC and AI infrastructure with the latest hardware and software technologies. He is leveraging his expertise to support and consult for several similar deployments for local and regional organizations such as the American University of Sharjah, and the National Center of Meteorology of Saudi Arabia.
Saber received his MSc and Ph.D. degrees in computer science from the University of Houston in 2008 and 2010, respectively. He then joined the oil and gas company TOTAL in 2011 as an HPC Research Scientist. Saber has been working at KAUST since 2012.
Saff Scientist, KAUST
Speaker(s) Profile:David Robert Pugh is an experienced research software engineer and data scientist who loves to teach.
David Robert Pugh just finished developing training materials to help data scientists get started managing their virtual environments with Conda and Docker. David Robert Pugh is currently developing data engineering solutions to accelerate distributed training of deep neural networks on HPC resources.
David Robert Pugh has a deep knowledge of the core data science Python stack: NumPy, SciPy, Pandas, Matplotlib, NetworkX, Jupyter, Scikit-Learn, PyTorch, TensorFlow.
Mohsin Ahmed Shaikh is a Computational Scientist at King Abdullah University of Science and Technology.
Data Science on HPC platforms
Rooh Khurram is working as a Staff Scientist at KAUST Supercomputer Lab at King Abdullah University of Science and Technology (KAUST) in Saudi Arabia. He has conducted research in finite element methods, high performance computing, multiscale methods, fluid structure interaction, detached eddy simulations, in-flight icing, and computational wind engineering. He has over 20 years of industrial and academic experience in CFD. He specializes in developing custom made computational codes for industrial and academic applications. His industrial collaborators include: Boeing, Bombardier, Bell Helicopter, and Newmerical Technologies Inc. Before joining KAUST in 2012, Rooh worked at the CFD Lab at McGill University and the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign. Rooh received his Ph.D. from the University of Illinois at Chicago in 2005. In addition to a Ph.D. in Civil Engineering, Rooh has degrees in Mechanical Engineering, Nuclear Engineering, and Aerospace Engineering.
DPC++ Training Workshop
Workshop introducing the DPC++ technology, targeting educating the audience about the ambitions behind it, how to write simple programs using it and some of the advanced features it offers. Data Parallel C++ is a high-level language designed for data-parallel programming. The intent is to provide developers with a higher-level language to use other than OpenCL and other languages, making programs portable across different architectures while keeping the ability to write hardware-specific kernels to optimize performance on different platforms.
Amr Mohamed Nasreldin Elsayed
Amr Elsayed is an HPC software engineer who’s been working with Brightskies for 4 years. Originally a graduate from Alexandria University with a Bachelor in computer engineering, during his time with Brightskies contributed in multiple projects including multiple collaborations with Intel regarding leveraging DAOS filesystem for oil and gas workflows, as well as contributing in open-sourcing the first seismic imaging code based on DPC++ as well as assisting customers on-site offering consultancy on software design and optimization.
Enabling and democratizing MLops in Healthcare
The implementation of AI-based systems is having increasing success in the Healthcare Industry, enabling technological advances for both diagnosis and treatment of clinical conditions, as well as for the optimization and improvement of the efficiency of healthcare facility management.
ML-based systems’ R&D and deployment have seen the emergence of the so-called MLOps, a framework that aims to solve many of the organizational challenges related to the training and deployment phases.
The implementation of MLops framework also requires the development of SW platforms, which provide tools for development teams to simplify and optimize workflows by reducing potential bottlenecks due, for example, to the management and use of a complex HW and SW infrastructure for the prototyping and development of AI models.
Dr Valerio Rizzo
AI Lead & Solution Architect, Lenovo
Valerio is the AI Lead & Solution Architect for Lenovo, he is key member of an expert team of Artificial Intelligence, Machine Learning and Deep Learning specialists operating within the EMEA field sales organization and its business development team. He is a recognized expert in the fields of neuroscience and neurophysiology with 10 years of track record in brain research made between Italy and USA.
Power and Cooling
HPC drives innovation & discovery, but increasing demands for more performance are driving up the heat of next-gen HPC processors and accelerators. In this workshop we will learn about current and future cooling approaches and the challenges of balancing with sustainability requirements
Leading AI and Solution Engineer, Lenovo
Technical expert specializing in data centers, specifically power and cooling with long-term experience in all types of computers and their environments. With a background in MEP and also a member of BCS (the British Computer Society was) Jim has worked in the IT industry for over 41 years, predominantly in professional services to a variety of large companies, including banks, finance houses, telecoms companies, and hardware manufacturers. During this time, he has undertaken several projects involving construction and renovations of data centers as a technical and project manager. Now with Dell for over 16 years, he advises and audits data center issues for Dell customers as well as supports the Dell sales team. Another area is partner support to work with Dell Data Center Infrastructure Partners. He has also completed and passed the ITIL Practitioner Exam, the European Union Code of Conduct for the Data Center Exam and DCDs Certification for Best Practices in Energy Efficiency. Along with being the only EU certified Dell for data center evaluator. He has also authored articles that have appeared in several trade magazines as well as appearing as a speaker at several trade shows. Jim has received in-house media training enabling him to speak publicly and on social media on behalf of
Cray AI Development Software Environment for HPE SUPERCOMPUTING Workshop
He HPE Cray AI Development Environment is a machine learning training platform that makes building machine learning models fast and easy.
The software platform enables Machine Learning Engineers and researchers to:
- Train models faster using state-of-the-art distributed training: by provisioning machines, setting up networking, optimizing communication between machines, efficient distributed data loading, and fault tolerance.
- Automatically find high-quality models with advanced hyperparameter tuning: including state-of-theart algorithms developed by the creators of Hyperband1 and ASHA2
- Efficiently utilize different accelerators (e.g. GPUs): with intelligent and configurable resource management.
- Track, reproduce, and collaborate on experiments: with automatic experiment tracking that works outof-the-box, covering code versions, metrics, checkpoints, and hyperparameters.
As an end-to-end training platform, the system integrates these features into an easy-to-use, high performance Machine Learning and Deep Learning environment that can be deployed on bare meta.
Kubernetes, or the cloud, supporting the largest providers such as AWS, Azure, and GCP”””
AI Solutions Engineer, Hewlett Packard Enterprise
Andrea is a presales solution engineer for the AI Strategy and Solution Group.