Superset Ditches Rockset: What You Need To Know

by Alex Johnson 48 views

Why We're Saying Goodbye to Rockset in Superset

Hey everyone! You might have noticed some changes brewing in the Superset community, and we're here to talk about a significant one: the removal of Rockset database support from Apache Superset. This isn't a decision we take lightly, but it's a necessary step to keep Superset lean, efficient, and focused on the databases that matter to our users today. The primary driver for this change is the unfortunate reality that Rockset, as an online service, is no longer available. Following its acquisition by OpenAI in 2024, the service was officially shut down on September 30, 2024. For those who were using Rockset with Superset, it's highly likely you've already moved on to other robust database solutions. Continuing to maintain support for a defunct service would introduce unnecessary complexities to our codebase, potentially confuse new users trying to understand Superset's vast connectivity options, and leave us carrying obsolete dependencies that offer no real value.

Removing deprecated database support is a crucial part of our ongoing maintenance strategy. It helps us keep the Superset codebase clean and manageable, reduces the cognitive load for newcomers evaluating what Superset can do, and systematically eliminates technical debt. This proactive approach ensures that Superset remains a cutting-edge analytics platform. The impact of this removal will be felt across several areas, including our documentation, various configuration files, the database engine specifications that define how Superset interacts with different data sources, and our comprehensive test suites. By taking these steps, we ensure that Superset remains agile and responsive to the evolving data landscape.

What Was Rockset Support Like in Superset?

Before we move forward, it's worth taking a moment to understand what Rockset database support entailed within Superset. Currently, our platform offers comprehensive integration for Rockset, which includes a dedicated database engine specification. This specification is the technical backbone that allows Superset to communicate effectively with Rockset, translating Superset's queries into a format Rockset understands and vice-versa. Beyond the technical implementation, we also provided detailed documentation on how to connect to a Rockset instance, guiding users through the setup process. To make Rockset easily identifiable, its logo was prominently featured in our project's README file, alongside other supported databases. For developers and users who needed to integrate Superset with Rockset, we included the necessary Python package dependency, specifically the Rockset SQLAlchemy driver, in our pyproject.toml file. This ensured that anyone looking to use Rockset had the required tools readily available. Furthermore, we maintained a suite of unit tests specifically designed to verify the functionality of our Rockset integration, ensuring its reliability. Finally, references to Rockset were present in our database support matrices and configuration guides, providing a complete picture of its supported status. The integration was thorough, meaning Rockset support was deeply embedded throughout the codebase, reflecting its previous importance as a supported data source.

Reproduction Steps: How to See the Current Rockset Integration

To fully appreciate the scope of what we're removing, let's walk through how you can observe the existing Rockset support in Superset right now. First, open the main README.md file of the Superset project. If you search for the term "rockset" (case-insensitive), you'll find its logo displayed proudly among the logos of other databases that Superset supports. This visual cue immediately tells users that Rockset is a first-class citizen in our ecosystem. Next, navigate to the documentation folder, specifically docs/docs/configuration/databases.mdx. Here, you'll find a dedicated section detailing how to establish a connection to a Rockset database, complete with configuration examples and explanations. This section is invaluable for users looking to leverage Rockset within Superset. Then, take a look at the project's build configuration file, pyproject.toml. You'll notice rockset-sqlalchemy listed as an optional dependency. This inclusion means that Superset has the capability to install and utilize the necessary driver for Rockset. Furthermore, dive into the database engine specifications located at superset/db_engine_specs/rockset.py. This file contains the specific logic and configurations that define how Superset interacts with Rockset, showcasing a complete implementation. Also, explore the unit tests, specifically tests/unit_tests/db_engine_specs/test_rockset.py, where you can find tests dedicated to validating Rockset-specific functionalities. Lastly, examine files like superset/sql/parse.py, where you might find commented-out references to Rockset within dialect mappings. Observing these points collectively demonstrates that Rockset support is fully integrated throughout the codebase, despite the service itself being decommissioned.

What We Expect After the Removal

Our goal with this update is straightforward: Superset should no longer have any trace of Rockset support. This means a clean sweep of all code, documentation, and configuration files related to this defunct database service. For our users, this translates to a much clearer and more accurate understanding of which databases are actively supported and maintained by the Superset project. You won't see Rockset listed as an option, nor will you find any remnants of its integration in the codebase. This streamlining effort ensures that Superset remains focused on providing robust support for currently active and widely used data platforms.

Acceptance Criteria: How We'll Know It's Done

To ensure this removal is comprehensive and successful, we've established clear acceptance criteria. These are the benchmarks we'll use to verify that all Rockset-related elements have been successfully purged from Superset:

  • [ ] Rockset logo and references removed from README.md: The README.md file will no longer display the Rockset logo or any mention of it in the list of supported databases.
  • [ ] Rockset documentation section removed from database configuration guide: The dedicated section in docs/docs/configuration/databases.mdx that previously explained how to connect to Rockset will be entirely deleted.
  • [ ] Rockset SQLAlchemy dependency removed from pyproject.toml: The rockset-sqlalchemy entry will be removed from the optional dependencies list in pyproject.toml.
  • [ ] Rockset database engine specification file deleted: The file superset/db_engine_specs/rockset.py will be completely removed from the project structure.
  • [ ] Rockset unit tests deleted: All unit tests specifically written for Rockset functionality, likely found in tests/unit_tests/db_engine_specs/test_rockset.py or similar locations, will be deleted.
  • [ ] Any remaining Rockset references cleaned up from code comments and documentation: A thorough search will be conducted across the entire codebase, including comments and other documentation files, to eliminate any lingering mentions of Rockset.
  • [ ] Database support matrix updated to exclude Rockset: Any matrices or lists that outline Superset's supported databases will be updated to reflect the removal of Rockset.

Meeting these criteria will confirm that the Rockset integration has been fully and cleanly removed from Superset.

Verifying the Rockset Removal

After we implement the changes, it's crucial to ensure that Rockset database support has been entirely eradicated from Superset. We'll employ a two-pronged approach: manual checks and automated verification. These steps are designed to catch any lingering traces and confirm the integrity of the codebase.

Manual Verification: A Hands-On Check

For manual verification, we'll conduct a series of direct checks. Firstly, we'll perform a comprehensive search across the entire codebase for the term "rockset", ensuring it's case-insensitive. The goal here is to find zero meaningful references to Rockset, confirming its complete absence. Secondly, we'll meticulously review the project's README.md file. This check will confirm that the Rockset logo is no longer displayed among the other supported databases, making the README an accurate reflection of current support. Thirdly, we'll examine the database configuration documentation, specifically looking at files like docs/docs/configuration/databases.mdx. We expect to find no instructions or mentions of how to connect to Rockset. Finally, we'll revisit the pyproject.toml file to re-confirm that the rockset-sqlalchemy dependency has been successfully removed. These manual checks provide a direct, human-verified confirmation that Rockset is no longer part of Superset.

Automated Verification: Trusting the Machines

Complementing our manual checks, we'll also leverage automated processes to ensure the removal is thorough and doesn't introduce regressions. Firstly, we will run the entire Superset test suite using pytest tests/. This is a critical step to ensure that removing Rockset support hasn't inadvertently broken any other part of the application. A successful test run indicates stability. Secondly, we'll verify that the project builds successfully without the Rockset dependency. This ensures that our build process is clean and doesn't fail due to missing or improperly handled components. Thirdly, if documentation builds are part of our workflow (e.g., using cd docs && npm run build), we'll execute this process to confirm that the documentation generation remains error-free after the removal. Finally, we will run our linters, such as pre-commit run --all-files, to ensure that the code quality is maintained and that no style or formatting issues have been introduced during the cleanup process. These automated checks provide a robust, data-driven confirmation of the successful and clean removal of Rockset support.

Next Steps and Resources

As we move forward with removing Rockset support, we encourage the community to stay informed. This change is a positive step towards a more streamlined and maintainable Superset. If you're looking for alternative data warehousing and analytics solutions, you might find these resources helpful:

  • Explore the official Apache Superset Documentation for the latest information on supported databases and features.
  • Learn more about modern data stack technologies by visiting The Data Stack Show, a great resource for understanding trends and tools in the data world.
  • For broader discussions on open-source data platforms, the Apache Software Foundation website is an excellent place to explore other projects and initiatives.