Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support JSONPath #22513

Open
16 of 18 tasks
justinj opened this issue Feb 8, 2018 · 3 comments
Open
16 of 18 tasks

sql: support JSONPath #22513

justinj opened this issue Feb 8, 2018 · 3 comments
Assignees
Labels
A-sql-json JSON handling in SQL. A-sql-pgcompat Semantic compatibility with PostgreSQL C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) E-starter Might be suitable for a starter project for new employees or team members. meta-issue Contains a list of several other issues. T-sql-queries SQL Queries Team target-release-25.2.0

Comments

@justinj
Copy link
Contributor

justinj commented Feb 8, 2018

Spec is available here. There's a good blog post covering the feature in Postgres here. I have an incomplete and non-Cockroach-compatible implementation here which includes a goyacc parser, which might be helpful.

Postgres documentation: https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH

I think this could potentially make a decent intern project, in terms of size and scope. It is not a feature of Postgres 10 but will be a feature of Postgres 11.

Implementation

Supported Features

Jira issue: CRDB-5852
Epic: CRDB-12464

gz#18507

@justinj justinj added feature A-sql-json JSON handling in SQL. labels Feb 8, 2018
@justinj justinj added this to the Later milestone Feb 8, 2018
@knz knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed feature labels Apr 24, 2018
@knz knz added C-wishlist A wishlist feature. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-wishlist A wishlist feature. labels May 2, 2018
@petermattis petermattis removed this from the Later milestone Oct 5, 2018
@jordanlewis jordanlewis added the A-sql-pgcompat Semantic compatibility with PostgreSQL label Nov 27, 2019
craig bot pushed a commit that referenced this issue Sep 18, 2020
54565: parser: parse REINDEX SCHEMA r=arulajmani a=otan

Refs: #51424

Release note: None

54568: sql: add unimplemented errors for jsonpath types and builtins r=arulajmani a=otan

Refs #22513 , #51424

Release note: None

54573: builtins: add unimplemented errors for full text search builtins r=arulajmani a=otan

Refs: #7821, #51424 

Release note: None

54575: parser: add unimplemented errors for CREATE/DROP ACCESS METHOD r=arulajmani a=otan

No issue number for this one (same as aggregate) as there's no way I think we can
realistically support this. Telemetry still there though.

Refs: #51424

Release note: None

Co-authored-by: Oliver Tan <[email protected]>
@brendesp
Copy link

This is a feature we use in PostgreSQL, that we will need to migrate to CockroachDB.

@theory
Copy link

theory commented Jun 21, 2022

Support for JSON path is essential for wildcard searches. Here's an example using movies.json, converted to hold one object per line via this command:

perl -i -pe 's/\]\},/]}\n/g' movies.json

Load it up into Postgres:

CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
\copy movies(movie) from PROGRAM 'sed "s|\\\\|\\\\\\\\|g" movies.json';
CREATE INDEX ON movies USING GIN (movie jsonb_path_ops);

If I want a list of all movies where Carrie Fisher is a cast member:

try=# select id from movies where movie @@ '$.cast[*] == "Carrie Fisher"';
  id   
-------
 47778
 48063
 48369
 48510
 48667
 48921
 49013
 49510
 49799
 50103
 50330
 50555
 50781
 51001
 55509
 57340
(16 rows)

This is because the cast field is an array, and cast members can appear anywhere in the array. This is simply not possible using @>, because it supports no wildcard search.

The closest equivalent I can get in Cockroach is:

CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
\copy movies(movie) from 'movies.json';
CREATE INDEX ON movies USING GIN (movie);

A couple of differences here from the Postgres example:

  • No need to escape the backslashes in the JSON; presumably Cockroach's COPY command does not support any backslashed characters
  • No support for jsonb_path_ops jsonb indexing (yet)

No big deal, but as there are no wildcards, we can search only one at a time:

select id from movies where movie->'cast'->>0 = 'Carrie Fisher';
         id         
--------------------
 772585567540477953

Now we can work around this by putting the array in an ANY() expression; this returns the expected 16 rows:

try=> select id from movies where 'Carrie Fisher' = ANY(ARRAY(SELECT jsonb_array_elements_text(movie->'cast')));
         id         
--------------------
 772585566846091265
 772585567098601473
 772585567355633665
 772585567540477953
 772585567631114241
 772585567923175425
 772585568023576577
 772585568487866369
 772585568738705409
 772585569013727233
 772585569180876801
 772585569342947329
 772585569525989377
 772585569765097473
 772585573808635905
 772585575588462593

However it cannot use the GIN index:

try=> explain analyze select id from movies where 'Carrie Fisher' = ANY(ARRAY(SELECT jsonb_array_elements_text(movie->'cast')));
                                               info                                                
---------------------------------------------------------------------------------------------------
 planning time: 9ms
 execution time: 35.9s
 distribution: local
 vectorized: true
 rows read from KV: 30,395 (5.3 MiB)
 cumulative time spent in KV: 3.4s
 maximum memory usage: 309 MiB
 network usage: 0 B (0 messages)
 
 • filter
 │ nodes: n1
 │ actual row count: 16
 │ estimated row count: 10,090
 │ filter: 'Carrie Fisher' = ANY COALESCE(array_agg, ARRAY[])
 │
 └── • render
     │ nodes: n1
     │ actual row count: 30,395
     │ estimated row count: 30,269
     │
     └── • group
         │ nodes: n1
         │ actual row count: 30,395
         │ estimated row count: 30,269
         │ group by: id
         │
         └── • apply join (left outer)
             │ nodes: n1
             │ actual row count: 86,511
             │ estimated row count: 303,950
             │
             └── • scan
                   nodes: n1
                   actual row count: 30,395
                   KV time: 3.4s
                   KV contention time: 0µs
                   KV rows read: 30,395
                   KV bytes read: 5.3 MiB
                   estimated row count: 30,395 (100% of the table; stats collected 31 minutes ago)
                   table: movies@primary
                   spans: FULL SCAN
(41 rows)

Time: 36067.055 ms (00:36.067)

Contrast that table scan with this plan from Postgres using jsonpath:

try=# explain analyse select id from movies where movie @@ '$.cast[*] == "Carrie Fisher"';                                                               QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on movies  (cost=12.02..23.46 rows=3 width=4) (actual time=0.054..0.176 rows=16 loops=1)
   Recheck Cond: (movie @@ '($."cast"[*] == "Carrie Fisher")'::jsonpath)
   Heap Blocks: exact=16
   ->  Bitmap Index Scan on movies_movie_idx1  (cost=0.00..12.02 rows=3 width=0) (actual time=0.033..0.033 rows=16 loops=1)
         Index Cond: (movie @@ '($."cast"[*] == "Carrie Fisher")'::jsonpath)
 Planning Time: 1.109 ms
 Execution Time: 0.253 ms

@theory
Copy link

theory commented Jun 21, 2022

Aaaaaand I just figured out that @> does the trick:

try=> select id from movies where movie @> '{"cast":["Carrie Fisher"]}';
         id         
--------------------
 772585566846091265
 772585567098601473
 772585567355633665
 772585567540477953
 772585567631114241
 772585567923175425
 772585568023576577
 772585568487866369
 772585568738705409
 772585569013727233
 772585569180876801
 772585569342947329
 772585569525989377
 772585569765097473
 772585573808635905
 772585575588462593
(16 rows)

Time: 132.415 ms
try=> explain analyze select id from movies where movie @> '{"cast":["Carrie Fisher"]}';
                                    info                                     
-----------------------------------------------------------------------------
 planning time: 14ms
 execution time: 15ms
 distribution: local
 vectorized: true
 rows read from KV: 16 (992 B)
 cumulative time spent in KV: 10ms
 maximum memory usage: 20 KiB
 network usage: 0 B (0 messages)
 
 • scan
   nodes: n1
   actual row count: 16
   KV time: 10ms
   KV contention time: 0µs
   KV rows read: 16
   KV bytes read: 992 B
   estimated row count: 0 (<0.01% of the table; stats collected 2 hours ago)
   table: movies@movies_movie_idx
   spans: 1 span
(19 rows)

Time: 38.905 ms

So much for the array wildcard argument!

@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Dec 10, 2022
@rafiss rafiss added the E-starter Might be suitable for a starter project for new employees or team members. label Dec 15, 2022
@mgartner mgartner moved this to New Backlog in SQL Queries Jul 24, 2023
@michae2 michae2 moved this from New Backlog to 24.1 Release in SQL Queries Aug 21, 2023
@mgartner mgartner moved this from 24.1 Release to 24.2 Release in SQL Queries Nov 27, 2023
@mgartner mgartner moved this from 24.2 Release to Backlog in SQL Queries May 16, 2024
normanchenn added a commit to normanchenn/cockroach that referenced this issue Feb 22, 2025
As part of work to implement jsonpath queries, this PR implements the
`jsonpath` type within CockroachDB. There is no parsing or evaluation
added, and the `jsonpath` will currently accept any non-empty string.
Table creation with jsonpath columns are not supported as of now
until an encoding and decoding schema is determined. Indexes are not
supported for `jsonpath` columns when they will be introduced.

Part of: cockroachdb#22513
Release note (sql change): Adding jsonpath type, without parsing,
evaluation, or table creation. Currently accepts any non-empty string.
craig bot pushed a commit that referenced this issue Feb 24, 2025
140204: sql: add jsonpath type r=normanchenn a=normanchenn

As part of work to implement jsonpath queries, this PR implements the`jsonpath` type within CockroachDB. There is no parsing or evaluation added, and the `jsonpath` will currently accept any non-empty string. Indexes are not currently supported for `jsonpath` columns.

Part of: #22513
Release note (sql change): Adding jsonpath type, without parsing or evaluation. Currently accepts any non-empty string.

Co-authored-by: Norman Chen <[email protected]>
normanchenn added a commit to normanchenn/cockroach that referenced this issue Feb 25, 2025
This is a prototype for a minimal jsonpath parser. It currently supports
a small set of features: setting jsonpath mode (strict/lax), root ($),
key accessors (.key_name), and array wildcards ([*]). This is standalone
and doesn't integrate with the database.

Part of: cockroachdb#22513
Release note: None
iskettaneh pushed a commit to iskettaneh/cockroach that referenced this issue Feb 26, 2025
140204: sql: add jsonpath type r=normanchenn a=normanchenn

As part of work to implement jsonpath queries, this PR implements the`jsonpath` type within CockroachDB. There is no parsing or evaluation added, and the `jsonpath` will currently accept any non-empty string. Indexes are not currently supported for `jsonpath` columns.

Part of: cockroachdb#22513
Release note (sql change): Adding jsonpath type, without parsing or evaluation. Currently accepts any non-empty string.

Co-authored-by: Norman Chen <[email protected]>
normanchenn added a commit to normanchenn/cockroach that referenced this issue Feb 26, 2025
This is a prototype for a minimal jsonpath parser. It currently supports
a small set of features: setting jsonpath mode (strict/lax), root ($),
key accessors (.key_name), and array wildcards ([*]). This is standalone
and doesn't integrate with the database.

Part of: cockroachdb#22513
Release note: None
craig bot pushed a commit that referenced this issue Feb 26, 2025
141661: sql: add jsonpath parser r=normanchenn a=normanchenn

This is a prototype for a minimal jsonpath parser. It currently supports a small set of features: setting jsonpath mode (strict/lax), root ($), key accessors (.key_name), and array wildcards ([*]). This is standalone and doesn't integrate with the database.

Part of: #22513
Release note: None

142037: kvserver: cache the storeClockTimestamp every tick r=iskettaneh a=iskettaneh

Instead of calculating the time now() multiple times during the tick. This commit caches timestamp during the tick(), and then it gets reused instead of redoing the work again.

Preliminary testing showed  that this reduces the clock mutex contention by 50%.

Fixes: #142049
Release note: None

142043: roachtest: Update pgx block list r=spilchen a=spilchen

The pgx test v5.TestBeginReadOnly previously expected an error when executing:

```
BEGIN READ ONLY
CREATE TABLE foo(id serial primary key)
```

Due to the recent autocommit_before_ddl change in #141145, the transaction is auto-committed before the CREATE TABLE, allowing the command to succeed. This update adjusts the block list so that we ignore this failed test.

I will backport this in: #141987 #141851.

Epic: none
Release note: none

Closes #141903

Co-authored-by: Norman Chen <[email protected]>
Co-authored-by: Ibrahim Kettaneh <[email protected]>
Co-authored-by: Matt Spilchen <[email protected]>
@yuzefovich yuzefovich moved this from Backlog to Active in SQL Queries Mar 5, 2025
@normanchenn normanchenn added the meta-issue Contains a list of several other issues. label Mar 10, 2025
craig bot pushed a commit that referenced this issue Mar 12, 2025
142336: builtins: add jsonb_path_query r=normanchenn a=normanchenn

Previously, the `jsonpath` data type and parser were created and integrated with each other. This PR adds the `jsonb_path_query` function and a new `pkg/util/jsonpath/eval` package, adding an evaluation engine for jsonpath queries.

Informs: #22513.
Epic: None

Release note (sql change): Add the `jsonb_path_query` function, which takes in a JSON object and a Jsonpath query, and returns the resulting JSON object.

Co-authored-by: Norman Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-json JSON handling in SQL. A-sql-pgcompat Semantic compatibility with PostgreSQL C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) E-starter Might be suitable for a starter project for new employees or team members. meta-issue Contains a list of several other issues. T-sql-queries SQL Queries Team target-release-25.2.0
Projects
Status: Active
Development

No branches or pull requests

8 participants