Publications - Aditya Bhosale

Preprints

Refereed journal articles

Ramachandran, P., Bhosale, A., Puri, K., Negi, P., Muta, A., Dinesh, A., Menon, D., Govind, R., Sanka, S., Sebastian, A. S., Sen, A., Kaushik, R., Kumar, A., Kurapati, V., Patil, M., Tavker, D., Pandey, P., Kaushik, C., Dutt, A., & Agarwal, A. (2021). PySPH: A Python-Based Framework for Smoothed Particle Hydrodynamics. ACM Trans. Math. Softw., 47(4). https://doi.org/10.1145/3460773

@article{pysph2021,
  author = {Ramachandran, Prabhu and Bhosale, Aditya and Puri, Kunal and Negi, Pawan and Muta, Abhinav and Dinesh, A. and Menon, Dileep and Govind, Rahul and Sanka, Suraj and Sebastian, Amal S. and Sen, Ananyo and Kaushik, Rohan and Kumar, Anshuman and Kurapati, Vikas and Patil, Mrinalgouda and Tavker, Deep and Pandey, Pankaj and Kaushik, Chandrashekhar and Dutt, Arkopal and Agarwal, Arpit},
  title = {PySPH: A Python-Based Framework for Smoothed Particle Hydrodynamics},
  year = {2021},
  issue_date = {December 2021},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  volume = {47},
  number = {4},
  issn = {0098-3500},
  url = {https://doi.org/10.1145/3460773},
  doi = {10.1145/3460773},
  journal = {ACM Trans. Math. Softw.},
  month = sep,
  articleno = {34},
  numpages = {38},
  keywords = {CPU, PySPH, GPU, smoothed particle hydrodynamics, open source, Python},
  file = {pysph.pdf}
}

PySPH is an open-source, Python-based, framework for particle methods in general and Smoothed Particle Hydrodynamics (SPH) in particular. PySPH allows a user to define a complete SPH simulation using pure Python. High-performance code is generated from this high-level Python code and executed on either multiple cores, or on GPUs, seamlessly. It also supports distributed execution using MPI. PySPH supports a wide variety of SPH schemes and formulations. These include, incompressible and compressible fluid flow, elastic dynamics, rigid body dynamics, shallow water equations, and other problems. PySPH supports a variety of boundary conditions including mirror, periodic, solid wall, and inlet/outlet boundary conditions. The package is written to facilitate reuse and reproducibility. This article discusses the overall design of PySPH and demonstrates many of its features. Several example results are shown to demonstrate the range of features that PySPH provides.

Refereed conference proceedings

Bhosale, A., Kale, L., & Kokkila-Schumacher, S. (2025). Efficient and Cost-Effective HPC on the Cloud. Proceedings of the 34th International Symposium on High-Performance Parallel and Distributed Computing. https://doi.org/10.1145/3731545.3744667

@inproceedings{flexscience25,
  author = {Bhosale, Aditya and Kale, Laxmikant and Kokkila-Schumacher, Sara},
  title = {Efficient and Cost-Effective HPC on the Cloud},
  year = {2025},
  isbn = {9798400718694},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3731545.3744667},
  doi = {10.1145/3731545.3744667},
  booktitle = {Proceedings of the 34th International Symposium on High-Performance Parallel and Distributed Computing},
  articleno = {52},
  numpages = {5},
  keywords = {HPC, spot instances, fault-tolerance, cost-efficiency},
  location = {University of Notre Dame Conference Facilities, Notre Dame, IN, USA},
  series = {HPDC '25},
  file = {flexscience25.pdf}
}

HPC applications are increasingly utilizing cloud resources due to their cost-effectiveness. Among these resources, spot compute instances present an opportunity to run applications at deep discounts compared to on-demand instances. However, they present unique challenges for tightly-coupled HPC applications due to potential interruptions. Traditional parallel programming models like MPI are not inherently fault-tolerant, and existing methods to handle these interruptions are inefficient and require significant programmer effort. In this paper, we present Charm++ as an alternative solution that natively supports fault tolerance, dynamic load balancing, and resource rescaling. We present a tool to run Charm++ applications with a mix of on-demand and spot instances which can detect and efficiently handle spot interruptions without a shared filesystem. We show that using spot instances can result in up to 60% cost savings for our benchmark application.

Bhosale, A., Chandrasekar, K., Kale, L., & Kokkila-Schumacher, S. (2025). An elastic job scheduler for HPC applications on the cloud. Proceedings of the SC ’25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, 151–162. https://doi.org/10.1145/3731599.3767358

@inproceedings{canopie25,
  author = {Bhosale, Aditya and Chandrasekar, Kavitha and Kale, Laxmikant and Kokkila-Schumacher, Sara},
  title = {An elastic job scheduler for HPC applications on the cloud},
  year = {2025},
  isbn = {9798400718717},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3731599.3767358},
  doi = {10.1145/3731599.3767358},
  booktitle = {Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis},
  pages = {151–162},
  numpages = {12},
  keywords = {HPC, job scheduler, kubernetes, elasticity},
  location = {
  },
  series = {SC Workshops '25},
  file = {canopie25.pdf}
}

The last few years have seen an increase in adoption of the cloud for running HPC applications. The pay-as-you-go cost model of these cloud resources has necessitated the development of specialized programming models and schedulers for HPC jobs for efficient utilization of cloud resources. A key aspect of efficient utilization is the ability to rescale applications on the fly to maximize the utilization of cloud resources. Most commonly used parallel programming models like MPI have traditionally not supported autoscaling either in a cloud environment or on supercomputers. While more recent work has been done to implement this functionality in MPI, it is still nascent and requires additional programmer effort. Charm++ is a parallel programming model that natively supports dynamic rescaling through its migratable objects paradigm. In this paper, we present a Kubernetes operator to run Charm++ applications on a Kubernetes cluster. We then present a priority-based elastic job scheduler that can dynamically rescale jobs based on the state of a Kubernetes cluster to maximize cluster utilization while minimizing response time for high-priority jobs. We show that our elastic scheduler, with the ability to rescale HPC jobs with minimal overhead, demonstrates significant performance improvements over traditional static schedulers.

Bhosale, A., Fink, Z., & Kale, L. (2024). An Abstraction for Distributed Stencil Computations Using Charm++. In P. Diehl, J. Schuchart, P. Valero-Lara, & G. Bosilca (Eds.), Asynchronous Many-Task Systems and Applications (pp. 123–134). Springer Nature Switzerland.

@inproceedings{charmstencil2024,
  author = {Bhosale, Aditya and Fink, Zane and Kale, Laxmikant},
  editor = {Diehl, Patrick and Schuchart, Joseph and Valero-Lara, Pedro and Bosilca, George},
  title = {An Abstraction for Distributed Stencil Computations Using Charm++},
  booktitle = {Asynchronous Many-Task Systems and Applications},
  year = {2024},
  publisher = {Springer Nature Switzerland},
  address = {Cham},
  pages = {123--134},
  isbn = {978-3-031-61763-8},
  file = {wamta24.pdf}
}

Python has emerged as a popular programming language for scientific computing in recent years, thanks to libraries like Numpy and SciPy. Numpy, in particular, is widely utilized for prototyping numerical solvers using methods such as finite difference, finite volume, and multigrid. However, Numpy’s performance is confined to a single node, compelling programmers to resort to a lower-level language for running large-scale simulations. In this paper, we introduce CharmStencil, a high-level abstraction featuring a Numpy-like Python frontend and a highly efficient Charm++ backend. Employing a client-server model, CharmStencil maintains productivity with tools like Jupyter notebooks on the frontend while utilizing a high-performance Charm++ library on the backend for computation. We demonstrate that CharmStencil achieves orders of magnitude better single-threaded performance compared to Numpy and can scale to thousands of CPU cores. Additionally, we showcase superior performance compared to cuNumeric and Numba, popular Python libraries for parallel array computations.

Ramos, E., White, S., Bhosale, A., & Kale, L. (2023). Runtime Techniques for Automatic Process Virtualization. Workshop Proceedings of the 51st International Conference on Parallel Processing. https://doi.org/10.1145/3547276.3548522

@inproceedings{ampi2023,
  author = {Ramos, Evan and White, Sam and Bhosale, Aditya and Kale, Laxmikant},
  title = {Runtime Techniques for Automatic Process Virtualization},
  year = {2023},
  isbn = {9781450394451},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3547276.3548522},
  doi = {10.1145/3547276.3548522},
  booktitle = {Workshop Proceedings of the 51st International Conference on Parallel Processing},
  articleno = {26},
  numpages = {10},
  location = {Bordeaux, France},
  series = {ICPP Workshops '22},
  file = {icpp23.pdf}
}

Asynchronous many-task runtimes look promising for the next generation of high performance computing systems. But these runtimes are usually based on new programming models, requiring extensive programmer effort to port existing applications to them. An alternative approach is to reimagine the execution model of widely used programming APIs, such as MPI, in order to execute them more asynchronously. Virtualization is a powerful technique that can be used to execute a bulk synchronous parallel program in an asynchronous manner. Moreover, if the virtualized entities can be migrated between address spaces, the runtime can optimize execution with dynamic load balancing, fault tolerance, and other adaptive techniques. Previous work on automating process virtualization has explored compiler approaches, source-to-source refactoring tools, and runtime methods. These approaches achieve virtualization with different tradeoffs in terms of portability (across different architectures, operating systems, compilers, and linkers), programmer effort required, and the ability to handle all different kinds of global state and programming languages. We implement support for three different related runtime methods, discuss shortcomings and their applicability to user-level virtualized process migration, and compare performance to existing approaches. Compared to existing approaches, one of our new methods achieves what we consider the best overall functionality in terms of portability, automation, support for migration, and runtime performance.

Aditya Bhosale, & Prabhu Ramachandran. ( 2020 ). Compyle: a Python package for parallel computing . In Meghann Agarwal, Chris Calloway, Dillon Niederhut, & David Shupe (Eds.), Proceedings of the 19th Python in Science Conference (pp. 32–39 ).
```
@inproceedings{compyle2020,
  author = {{A}ditya {B}hosale and {P}rabhu {R}amachandran},
  title = { {C}ompyle: a {P}ython package for parallel computing },
  booktitle = { {P}roceedings of the 19th {P}ython in {S}cience {C}onference },
  pages = { 32 - 39 },
  year = { 2020 },
  editor = {{M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe},
  doi = { 10.25080/Majora-342d178e-005 },
  file = {compyle.pdf}
}
```

Preprints

Refereed journal articles

Refereed conference proceedings

About

Contact

Coordinates