-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-6728] Introduce new methods to lookup tables and schemas inside schemas #4100
Conversation
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 90 days if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
We are still interested in getting this PR merged. |
I will try to review this, although it's in an area where I don't know much about the codebase. |
I know that this PR is quite huge. I discussed it with Julian Hyde, if it makes sense to open such a PR. For details see https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-6728 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has any benchmarking been done to prove the efficiency of this approach?
I am not an expert in this part of the code, but the PR looks pretty good to me.
I have only made "syntactic" comments.
core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcBaseSchema.java
Outdated
Show resolved
Hide resolved
@@ -7489,7 +7488,7 @@ private void checkGetTimestamp(Connection con) throws SQLException { | |||
aSchema.setCacheEnabled(true); | |||
|
|||
// explicit should win implicit. | |||
assertThat(aSchema.getSubSchemaNames(), hasSize(1)); | |||
assertThat(aSchema.subSchemas().getNames(LikePattern.any()), hasSize(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change all these tests, can't the original function still be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I first startet with a @Deprecated
annotation on getSubSchemaNames()
. Therefore, I needed to rename it in all tests.
Afterwards, I removed the @Deprecated
annotation because it might cause trouble in some cases.
I could revert this change, if you prefer this.
core/src/test/java/org/apache/calcite/schema/lookup/MapLookup.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/calcite/schema/lookup/MapLookup.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcCatalogSchema.java
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcSchema.java
Outdated
Show resolved
Hide resolved
a34f9f4
to
f67b532
Compare
This PR only improves performance for huge database. We are using a database with more than 500000 schemas containing up to 500000 tables. In such an environment, it takes more than 10 seconds to load all table names from the database. Formerly, this was necessary during the preparation of each query. With the new approach, only the involved tables are loaded from the database. This speeds up the preparation by factors. It also takes much less memory, because it's no longer necessary to hold a list of all tables in memory (snapshot). |
|
f67b532
to
3ba680f
Compare
Please ask for a new review when you are done |
3ba680f
to
62598b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty good, I think one more iteration and we can merge it.
Regarding the deprecation, I would trust @asolimando's expertise.
/** | ||
* Returns a table with a given name, or null if not found. | ||
* | ||
* <p>Please use {@link Schema#tables()} and {@link Lookup#get(String)} instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these methods are not used anywhere anymore, then it may be good to mark them as deprecated, and document this in history.md.
core/src/main/java/org/apache/calcite/schema/lookup/CompatibilityLookup.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/schema/lookup/CompatibilityLookup.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/schema/lookup/ConcatLookup.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/schema/lookup/Lookup.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/schema/lookup/Lookup.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/schema/lookup/TransformingLookup.java
Outdated
Show resolved
Hide resolved
da337e7
to
e0dea42
Compare
site/_docs/history.md
Outdated
introduces new methods to lookup tables and sub schemas inside schemas. | ||
The methods used before (`Schema:getTable(String name)`, `Schema:getTableNames()`, | ||
`Schema.getSubSchema(String name)` and `Schema.getSubSchemaNames(String name)`) | ||
have been markes as deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
I think this is ready for merging; please fix the typo when you squash the commits. |
…ide schemas [CALCITE-6728] Changes due to the PR review [CALCITE-6728] Changes due to the second PR review [CALCITE-6728] Fix typo
I fixed the type and squashed all commits. |
Motivation
For databases with a huge set of schemas and tables it takes quite long to prepare queries. Currently all tables/schemas are loaded into memory.
Caching all these schemas and tables is not an option
Therefore, we tried to find a way to load only those tables/schemas, which are required to prepare a query.
API Changes
This PR introduces a new mechanism to lookup tables and schemas within a schema. For this purpose a new interface is introduced
The
LikePattern
was extracted fromCalciteMetaImpl
to hold a pattern, which can be used to query tables and schemas inside a JDBC database using theLIKE
operator. Additionally, it also supports the conversion to aPredicate1<String>
which can be used to implement filters in plain java.The
Schema
is now using thisLookup
interface to find schemas and tables. It could be also extended to functions and types.Implementation
The case insensitive search is now directly implemented in the specific
Schema
using matching implementation of theLookup
interface. Formerly, it was done in theCalciteSchema
.JdbcSchema
andJdbcCatalogSchema
are using a special implementation ofLookup
:LoadingCacheLookup
. This implementation is using aLoadingCache
inside to speed up things. If only case sensitive schema/table lookup is required, this can be done quite fast sinceDatabaseMetaData#getTables
can be used to query a single table. The result is cached inside theLoadingCache
for one minute.Unfortunately
DatabaseMetaData#getTables
doesn't support case insensitive queries. In this case, it's still required to load all database tables to perform case insensitive lookups.The performance gain for huge sets of tables/schemas in database schemas can only be achieved if caching is turned off in Calcite (
SimpleCalciteSchema
is used instead ofCachingCalciteSchema
).I tried to keep the behavior of
CachingCalciteSchema
exactly the same. This behavior includes that all tables/schemas are loaded into memory.CachedLookup
is used to achieve this.