[CALCITE-6728] Introduce new methods to lookup tables and schemas inside schemas #4100

kramerul · 2024-12-19T13:19:22Z

Motivation

For databases with a huge set of schemas and tables it takes quite long to prepare queries. Currently all tables/schemas are loaded into memory.

Caching all these schemas and tables is not an option

It will require a lot of memory
The eviction of the cache must happen quite often since it's likely that every second one of these table is changed.

Therefore, we tried to find a way to load only those tables/schemas, which are required to prepare a query.

API Changes

This PR introduces a new mechanism to lookup tables and schemas within a schema. For this purpose a new interface is introduced

public interface Lookup<T> {
  @Nullable T get(String name);
  @Nullable Named<T> getIgnoreCase(String name);
  Set<String> getNames(LikePattern pattern);
}

The LikePattern was extracted from CalciteMetaImpl to hold a pattern, which can be used to query tables and schemas inside a JDBC database using the LIKE operator. Additionally, it also supports the conversion to a Predicate1<String> which can be used to implement filters in plain java.

The Schema is now using this Lookup interface to find schemas and tables. It could be also extended to functions and types.

public interface Schema {
  default Lookup<Table> tables() {
    ...
  }
  default Lookup<? extends Schema> subSchemas() {
    ...
  }
  ...
}

Implementation

The case insensitive search is now directly implemented in the specific Schema using matching implementation of the Lookup interface. Formerly, it was done in the CalciteSchema.

JdbcSchema and JdbcCatalogSchema are using a special implementation of Lookup: LoadingCacheLookup. This implementation is using a LoadingCache inside to speed up things. If only case sensitive schema/table lookup is required, this can be done quite fast since DatabaseMetaData#getTables can be used to query a single table. The result is cached inside the LoadingCache for one minute.

Unfortunately DatabaseMetaData#getTables doesn't support case insensitive queries. In this case, it's still required to load all database tables to perform case insensitive lookups.

The performance gain for huge sets of tables/schemas in database schemas can only be achieved if caching is turned off in Calcite (SimpleCalciteSchema is used instead of CachingCalciteSchema).

I tried to keep the behavior of CachingCalciteSchema exactly the same. This behavior includes that all tables/schemas are loaded into memory. CachedLookup is used to achieve this.

sonarqubecloud · 2024-12-19T13:49:05Z

Quality Gate passed

Issues
20 New issues
0 Accepted issues

Measures
0 Security Hotspots
82.0% Coverage on New Code
0.2% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2025-02-02T03:35:25Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 90 days if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

kramerul · 2025-02-03T05:47:55Z

We are still interested in getting this PR merged.

mihaibudiu · 2025-02-04T03:03:12Z

I will try to review this, although it's in an area where I don't know much about the codebase.

kramerul · 2025-02-04T07:35:55Z

I know that this PR is quite huge. I discussed it with Julian Hyde, if it makes sense to open such a PR. For details see https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-6728

mihaibudiu

Has any benchmarking been done to prove the efficiency of this approach?
I am not an expert in this part of the code, but the PR looks pretty good to me.
I have only made "syntactic" comments.

core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcBaseSchema.java

mihaibudiu · 2025-02-04T18:54:03Z

core/src/test/java/org/apache/calcite/test/JdbcTest.java

@@ -7489,7 +7488,7 @@ private void checkGetTimestamp(Connection con) throws SQLException {
    aSchema.setCacheEnabled(true);

    // explicit should win implicit.
-    assertThat(aSchema.getSubSchemaNames(), hasSize(1));
+    assertThat(aSchema.subSchemas().getNames(LikePattern.any()), hasSize(1));


why change all these tests, can't the original function still be used?

I first startet with a @Deprecated annotation on getSubSchemaNames(). Therefore, I needed to rename it in all tests.
Afterwards, I removed the @Deprecated annotation because it might cause trouble in some cases.

I could revert this change, if you prefer this.

core/src/test/java/org/apache/calcite/schema/lookup/MapLookup.java

core/src/main/java/org/apache/calcite/jdbc/CalciteSchema.java

core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcCatalogSchema.java

core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcSchema.java

kramerul · 2025-02-12T06:09:44Z

Has any benchmarking been done to prove the efficiency of this approach? I am not an expert in this part of the code, but the PR looks pretty good to me. I have only made "syntactic" comments.

This PR only improves performance for huge database. We are using a database with more than 500000 schemas containing up to 500000 tables.

In such an environment, it takes more than 10 seconds to load all table names from the database. Formerly, this was necessary during the preparation of each query. With the new approach, only the involved tables are loaded from the database. This speeds up the preparation by factors. It also takes much less memory, because it's no longer necessary to hold a list of all tables in memory (snapshot).

sonarqubecloud · 2025-02-12T06:29:59Z

Quality Gate passed

Issues
21 New issues
0 Accepted issues

Measures
0 Security Hotspots
82.3% Coverage on New Code
0.1% Duplication on New Code

See analysis details on SonarQube Cloud

mihaibudiu · 2025-02-12T17:56:19Z

Please ask for a new review when you are done

mihaibudiu

This is pretty good, I think one more iteration and we can merge it.
Regarding the deprecation, I would trust @asolimando's expertise.

mihaibudiu · 2025-02-22T23:01:18Z

core/src/main/java/org/apache/calcite/schema/Schema.java

  /**
   * Returns a table with a given name, or null if not found.
   *
+   * <p>Please use {@link Schema#tables()} and {@link Lookup#get(String)} instead.


If these methods are not used anywhere anymore, then it may be good to mark them as deprecated, and document this in history.md.

core/src/main/java/org/apache/calcite/schema/Schema.java

core/src/main/java/org/apache/calcite/schema/lookup/CompatibilityLookup.java

core/src/main/java/org/apache/calcite/schema/lookup/ConcatLookup.java

core/src/main/java/org/apache/calcite/schema/lookup/Lookup.java

core/src/main/java/org/apache/calcite/schema/lookup/SnapshotLookup.java

core/src/main/java/org/apache/calcite/schema/lookup/TransformingLookup.java

core/src/main/java/org/apache/calcite/util/LazyReference.java

mihaibudiu · 2025-02-28T18:18:43Z

site/_docs/history.md

+introduces new methods to lookup tables and sub schemas inside schemas.
+The methods used before (`Schema:getTable(String name)`, `Schema:getTableNames()`,
+`Schema.getSubSchema(String name)` and `Schema.getSubSchemaNames(String name)`)
+have been markes as deprecated.


mihaibudiu · 2025-02-28T18:19:14Z

I think this is ready for merging; please fix the typo when you squash the commits.

…ide schemas [CALCITE-6728] Changes due to the PR review [CALCITE-6728] Changes due to the second PR review [CALCITE-6728] Fix typo

kramerul · 2025-03-04T09:11:08Z

I fixed the type and squashed all commits.

kramerul marked this pull request as ready for review January 2, 2025 08:07

github-actions bot added the stale label Feb 2, 2025

github-actions bot removed the stale label Feb 4, 2025

mihaibudiu reviewed Feb 4, 2025

View reviewed changes

kramerul force-pushed the CALCITE-6728 branch from a34f9f4 to f67b532 Compare February 12, 2025 05:55

kramerul force-pushed the CALCITE-6728 branch from f67b532 to 3ba680f Compare February 12, 2025 07:37

kramerul force-pushed the CALCITE-6728 branch from 3ba680f to 62598b7 Compare February 13, 2025 05:27

F21 force-pushed the main branch from 7d38212 to cacf36a Compare February 17, 2025 03:33

kramerul requested a review from mihaibudiu February 17, 2025 05:58

mihaibudiu reviewed Feb 22, 2025

View reviewed changes

kramerul force-pushed the CALCITE-6728 branch from da337e7 to e0dea42 Compare February 25, 2025 13:15

mihaibudiu approved these changes Feb 28, 2025

View reviewed changes

mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Feb 28, 2025

[CALCITE-6728] Introduce new methods to lookup tables and schemas ins…

052747e

…ide schemas [CALCITE-6728] Changes due to the PR review [CALCITE-6728] Changes due to the second PR review [CALCITE-6728] Fix typo

kramerul force-pushed the CALCITE-6728 branch from e0dea42 to 052747e Compare March 4, 2025 09:10

mihaibudiu merged commit efafa4f into apache:main Mar 6, 2025
18 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CALCITE-6728] Introduce new methods to lookup tables and schemas inside schemas #4100

[CALCITE-6728] Introduce new methods to lookup tables and schemas inside schemas #4100

kramerul commented Dec 19, 2024 •

edited

Loading

sonarqubecloud bot commented Dec 19, 2024

github-actions bot commented Feb 2, 2025

kramerul commented Feb 3, 2025

mihaibudiu commented Feb 4, 2025

kramerul commented Feb 4, 2025

mihaibudiu left a comment

mihaibudiu Feb 4, 2025

kramerul Feb 10, 2025

kramerul commented Feb 12, 2025

sonarqubecloud bot commented Feb 12, 2025

mihaibudiu commented Feb 12, 2025

mihaibudiu left a comment

mihaibudiu Feb 22, 2025

mihaibudiu Feb 28, 2025

mihaibudiu commented Feb 28, 2025

kramerul commented Mar 4, 2025

[CALCITE-6728] Introduce new methods to lookup tables and schemas inside schemas #4100

[CALCITE-6728] Introduce new methods to lookup tables and schemas inside schemas #4100

Conversation

kramerul commented Dec 19, 2024 • edited Loading

Motivation

API Changes

Implementation

sonarqubecloud bot commented Dec 19, 2024

Quality Gate passed

github-actions bot commented Feb 2, 2025

kramerul commented Feb 3, 2025

mihaibudiu commented Feb 4, 2025

kramerul commented Feb 4, 2025

mihaibudiu left a comment

Choose a reason for hiding this comment

mihaibudiu Feb 4, 2025

Choose a reason for hiding this comment

kramerul Feb 10, 2025

Choose a reason for hiding this comment

kramerul commented Feb 12, 2025

sonarqubecloud bot commented Feb 12, 2025

Quality Gate passed

mihaibudiu commented Feb 12, 2025

mihaibudiu left a comment

Choose a reason for hiding this comment

mihaibudiu Feb 22, 2025

Choose a reason for hiding this comment

mihaibudiu Feb 28, 2025

Choose a reason for hiding this comment

mihaibudiu commented Feb 28, 2025

kramerul commented Mar 4, 2025

kramerul commented Dec 19, 2024 •

edited

Loading