GitHub - databricks-industry-solutions/ray-framework-on-databricks

Business Problem

Many organizations today face a critical challenge in their data science and analytics operations. Data scientists, statisticians, and data developers are often relying on legacy single-node processes to perform complex scientific computing tasks such as optimization, simulation, linear programming, and numerical computing. While these approaches may have been sufficient in the past, they are increasingly inadequate in the face of two major trends:

The exponential growth in data volume and complexity that needs to be modeled and analyzed.
Heightened business pressure to obtain modeling results and insights faster to enable timely decision-making.

As a result, there is an urgent need for more advanced, scalable techniques that can handle larger datasets and deliver results more quickly. However, transitioning away from established single-node processes presents several challenges:

Existing code and workflows are often tightly coupled to single-node architectures.
Data scientists may lack expertise in distributed computing paradigms.
There are concerns about maintaining reproducibility and consistency when scaling to distributed environments.
The cost and complexity of setting up and managing distributed infrastructure can be prohibitive.

To address these challenges, this repository demonstrates an expanding set of approaches leveraging the distributed computing framework Ray, implemented on the Databricks data lakehouse platform. The solutions presented aim to:

Scale single-node processes horizontally with minimal code refactoring, preserving existing workflows where possible.
Achieve significant improvements in runtime and performance, often by orders of magnitude.
Enable organizations to make better, more timely business decisions based on the most up-to-date simulation or optimization results.
Provide a smooth transition path for data scientists to adopt distributed computing practices.
Leverage the managed infrastructure and integrated tools of the Databricks platform to simplify deployment and management.

By adopting these approaches, organizations can modernize their scientific computing capabilities on Databricks to meet the demands of today's data-intensive business environment. This allows them to unlock new insights, respond more quickly to changing conditions, and gain a competitive edge through advanced analytics at scale.

This repo currently contains examples for the following scientific computing use-cases:

1. Bin Packing Optimization

The bin packing problem is a classic optimization challenge with significant real-world implications, and this solution demonstrates how to scale a Python library to solve it efficiently using Ray Core components.

Get started here: Bin Packing Optimization/01_intro_to_binpacking

Reference Architecture

Authors

[email protected] [email protected]

Project support

Please note the code in this project is provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects. The source in this project is provided subject to the Databricks License. All included or referenced third party libraries are subject to the licenses set forth below.

Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.

License

© 2025 Databricks, Inc. All rights reserved. The source in this notebook is provided subject to the Databricks License [https://www.databricks.com/legal/db-license]. All included or referenced third party libraries are subject to the licenses set forth below.

library	description	license	source
ray	Framework for scaling AI/Python applications	Apache 2.0	ray-project/ray
py3dbp	3D Bin Packing implementation	MIT	enzoruiz/3dbinpacking
prometheus	Service monitoring system	Apache 2.0	prometheus/prometheus
grafana	Open-source platform for monitoring and observability	AGPL-3.0-only	grafana/grafana

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github/workflows		.github/workflows
Bin_Packing_Optimization		Bin_Packing_Optimization
DevOps_Deployment_DABs		DevOps_Deployment_DABs
Examples		Examples
Many_Models_Training_Ray_Data		Many_Models_Training_Ray_Data
Ray_Dashboard_Metrics		Ray_Dashboard_Metrics
images		images
template		template
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
NOTICE.md		NOTICE.md
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Business Problem

1. Bin Packing Optimization

Reference Architecture

Authors

Project support

License

About

Releases

Packages

Contributors 5

Languages

License

databricks-industry-solutions/ray-framework-on-databricks

Folders and files

Latest commit

History

Repository files navigation

Business Problem

1. Bin Packing Optimization

Reference Architecture

Authors

Project support

License

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages