Superset Drops Rockset Database Support

by Alex Johnson 40 views

The End of an Era: Why We're Saying Goodbye to Rockset in Superset

As many of you in the data visualization and business intelligence community know, Superset is constantly evolving to provide the best possible experience for its users. Part of that evolution involves keeping our support for databases relevant and up-to-date. Today, we're announcing a necessary change: we are removing support for the Rockset database from Apache Superset. This decision, while perhaps a bit sad for those who might have used it, is a crucial step in maintaining a lean, efficient, and user-friendly platform for everyone. You might be wondering why this is happening now, and the answer is quite straightforward: Rockset's online service was acquired by OpenAI and subsequently shut down on September 30, 2024. This means the service is no longer available to the public, and any remaining users have, or certainly should have, moved on to more current and active database solutions. Continuing to support a database that no longer exists would create several problems. Firstly, it adds an unnecessary maintenance burden on our development team. Every supported database requires ongoing attention, updates, and testing to ensure compatibility and security. With Rockset offline, this maintenance becomes fruitless. Secondly, it can lead to confusion for new users exploring Superset's extensive list of supported databases. They might see Rockset and assume it's a viable option, leading to wasted effort and potential frustration. Finally, keeping obsolete dependencies in the project, even if unused, bloats the codebase and can potentially introduce subtle security risks or conflicts down the line. Therefore, removing deprecated database support is not just about tidying up; it's a vital part of our commitment to keeping the Superset codebase clean, reducing confusion for new users evaluating Superset's capabilities, and actively eliminating technical debt. This will involve changes across various aspects of Superset, including our comprehensive documentation, various configuration files, the underlying database engine specifications, and our robust test suites. We believe this move will ultimately benefit the entire Superset community by ensuring we focus our resources on databases that are actively used and developed.

What This Means for Superset: A Look at the Current and Expected Behavior

Right now, if you were to dive into the Superset project, you'd find that Rockset is fully integrated as a supported database backend. This isn't just a superficial mention; it means we have a dedicated implementation for its database engine specification, detailed documentation guiding users on how to connect to Rockset, and even Rockset's logo proudly displayed in our README file among other supported databases. Under the hood, the project has a Python package dependency for the Rockset SQLAlchemy driver, rockset-sqlalchemy, which is listed in our pyproject.toml file. We also have specific unit tests for Rockset-only functionality to ensure it worked correctly. You'd find references to Rockset in various places, from our database support matrices to our configuration guides, and even in code like superset/sql/parse.py, where it might appear in dialect mappings, albeit sometimes commented out. To see this in action, you could perform a few simple checks. Open the project's README.md and search for "rockset"; you'll see its logo. Check the docs/docs/configuration/databases.mdx file, and you'll find a dedicated section explaining how to connect to Rockset. Reviewing pyproject.toml will reveal rockset-sqlalchemy as an optional dependency. Examining superset/db_engine_specs/rockset.py will show a complete database engine specification. Furthermore, checking tests/unit_tests/db_engine_specs/test_rockset.py confirms the existence of unit tests for Rockset. Even in superset/sql/parse.py, you might find commented references. Essentially, Rockset support is deeply embedded throughout the codebase.

However, with the service now defunct, this current behavior is no longer appropriate. The expected behavior is that Superset will have all Rockset-related code, documentation, and configuration completely removed. This means users should no longer see Rockset listed as a supported database anywhere in the Superset interface or documentation, and the codebase should be entirely free of any references to this now-defunct service. To achieve this, we've outlined specific acceptance criteria. First, the Rockset logo and all mentions of it must be removed from README.md. Second, the dedicated Rockset section within the database configuration guide in docs/docs/configuration/databases.mdx needs to be deleted. Third, the rockset-sqlalchemy dependency must be removed from pyproject.toml. Fourth, the Rockset database engine specification file, superset/db_engine_specs/rockset.py, will be deleted. Fifth, all Rockset unit tests located in tests/unit_tests/db_engine_specs/test_rockset.py will be removed. Sixth, any remaining Rockset references, whether in code comments or documentation files, must be cleaned up. Finally, the database support matrix needs to be updated to accurately reflect that Rockset is no longer supported. By implementing these changes, we ensure that Superset remains a streamlined and relevant tool for our users.

Ensuring a Smooth Transition: Verification and Next Steps

To make sure the removal of Rockset support is thorough and doesn't introduce any unintended issues, we have a robust verification process in place. This process involves both manual and automated checks to guarantee that all traces of Rockset are gone and that the project remains stable.

Manual Verification Steps:

First, we'll conduct a comprehensive search of the entire codebase. A simple command like grep -r "rockset" . (case-insensitive) should yield no meaningful results related to active functionality. This ensures that no stray code snippets or references remain hidden. Next, we'll visually inspect key areas. We'll review the README.md file to confirm that the Rockset logo is no longer displayed among the list of supported databases. We'll also check the database configuration documentation, specifically docs/docs/configuration/databases.mdx, to ensure that there are no lingering instructions or sections dedicated to connecting to Rockset. Lastly, we'll examine the pyproject.toml file to confirm that the rockset-sqlalchemy dependency has been successfully removed. These manual checks provide a high-level confirmation that the removal is complete and visible.

Automated Verification Steps:

Complementing the manual checks, we have a suite of automated tests designed to catch any issues automatically. A crucial step is to run the entire test suite using pytest tests/. This ensures that the removal of Rockset's code and dependencies hasn't inadvertently broken any existing tests for other features. We also need to verify that the project builds successfully without the Rockset dependency, confirming that all build configurations are correctly updated. If applicable, we'll ensure the documentation builds without errors by running commands like cd docs && npm run build. Finally, we'll run our linters, such as pre-commit run --all-files, to maintain code quality and ensure that the changes adhere to our coding standards. These automated steps are vital for maintaining the integrity and stability of the Superset project.

Submission and Community Collaboration:

For those interested in contributing to this effort or verifying the changes, we've provided clear guidelines. You can record your screen using a tool like cap.so to demonstrate the removal of Rockset support and export it as an MP4 for submission. We also have a helpful guide on submitting pull requests: https://hackmd.io/@timothy1ee/Hky8kV3hlx. Your contributions are invaluable in keeping Apache Superset a robust and well-maintained platform. By diligently removing outdated support, we ensure that Superset remains a leading tool in the data analytics landscape, focusing on the technologies that matter most to our active user base. This proactive maintenance allows us to dedicate more resources to enhancing existing features and exploring new integrations that will benefit the community moving forward. We appreciate your understanding and support as we continue to refine Apache Superset.

For further information on managing database connections in Apache Superset, please refer to the official documentation on Connecting to Data Sources. For general Apache Superset updates and community discussions, the Apache Superset Blog is an excellent resource.