ACENET: Parallel computing with Dask
Parallel computing is the business of breaking a large problem into tens, hundreds, or even thousands of smaller problems which can then be solved at the same time using a cluster of computers, or supercomputer. It can reduce processing time to a fraction of what it would be on a desktop or workstation, or enable you to tackle larger, more complex problems. It鈥檚 widely used in big data mining, AI, time-critical simulations, and advanced graphics such as augmented or virtual reality. It鈥檚 used in fields as diverse as genetics, biotech, GIS, computational fluid dynamics, medical imaging, drug discovery, and agriculture.
Python is a popular language because it is easy to create programs quickly with simple syntax and a 鈥渂atteries included鈥 philosophy. However, there are some drawbacks to the language. It is notoriously difficult to parallelize because of a component called the global interpreter lock, and Python programs typically take many times longer to run than compiled languages such as Fortran, C, and C++, making Python less popular for creating performance-critical programs. Dask was developed to address the first problem of parallelism by constructing task graphs that can be processed using a variety of parallelization and hardware configurations. The second problem of performance can be addressed by converting performance-critical parts into a compiled language such as C/C++ nearly automatically using Cython. Together, Cython and Dask can be used to gain greater performance and parallelism of Python programs.
Prerequisites: Before you take this training, you should...
- have taken
- have familiarity with Python programming
This session will take place on:
- Tuesday, June 4, 2:00--4:00 pm
- Thursday, June 6, 2:00--4:00 pm