Dask: From Scratch to Scalable Analytics in Python! :)

A Set of Practical, Powerful, and Sexy Libraries for Working with Machine and Deep Learning!

Dask is a set of flexible libraries for parallel computing in Python consisting of two parts:

  • Dynamic Task Scheduling: It’s like Airflow, Luigi, Celery, or Make but optimized for interactive computing workloads.

  • Custom types for “Big Data”: such as parallel arrays, dataframes, and lists that extend standard interfaces like NumPy, Pandas, or Python iterators for distributed environments, or larger than memory. These parallel collections run on top of dynamic task schedulers.

In addition to this part, there is still a strong integration with frameworks and other libraries for data science, customized interfaces to facilitate its use, in addition to being an open-source project with a large maintainer community and having a vast ecosystem of integrations and other “daughter” libraries. ”

Find out more about Dask at: