Volcano Framework

Volcano : Framework for HPC Workloads in Kubernetes

A framework dedicated to run AI/ML and Big Data Workloads in Kubernetes

Volcano Framework

Volcano : Framework for HPC Workloads in Kubernetes

A framework dedicated to run AI/ML and Big Data Workloads in Kubernetes

Volcano is system for runnning high performance workloads on Kubernetes. It provides a suite of mechanisms currently missing from Kubernetes that are commonly required by many classes of high performance workload including:

  • Machine learning/Deep learning,
  • BioInformatics/Genomics, and
  • Other “big data” applications.

These types of applications typically run on generalized domain frameworks like Tensorflow, Spark, PyTorch, MPI, etc, which Volcano integrates with.

Some examples of the mechanisms and features that Volcano adds to Kubernetes are:

Scheduling extensions, e.g:

  • Co-scheduling
  • Fair-share scheduling
  • Queue scheduling
  • Preemption and reclaims
  • Reservartions and backfills
  • Topology-based scheduling

Job management extensions and improvements, e.g: - Multi-pod jobs - Improved error handling - Indexed jobs - Others (in upstream)

Optimizations for throughput, round-trip latency, etc.
Volcano builds upon a decade and a half of experience running a wide variety of high performance workloads at scale using several systems and platforms, combined with best-of-breed ideas and practices from the open source community.

comments powered by Disqus