Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery load fails with JSON array #5542

Open
turb opened this issue Jan 15, 2025 · 2 comments · May be fixed by #5544
Open

BigQuery load fails with JSON array #5542

turb opened this issue Jan 15, 2025 · 2 comments · May be fixed by #5544
Labels
bug Something isn't working

Comments

@turb
Copy link
Contributor

turb commented Jan 15, 2025

When loading data to BigQuery with a JSON column, it fails when the actual data is a Json array [ data1, data2 ]:

Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize value of type `com.google.api.services.bigquery.model.TableRow` from Array value (token `JsonToken.START_ARRAY`)
 at [Source: REDACTED (`StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION` disabled); line: 1, column: 1]
	com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:59)
	com.fasterxml.jackson.databind.DeserializationContext.reportInputMismatch(DeserializationContext.java:1767)
	com.fasterxml.jackson.databind.DeserializationContext.handleUnexpectedToken(DeserializationContext.java:1541)
	com.fasterxml.jackson.databind.deser.std.StdDeserializer._deserializeFromArray(StdDeserializer.java:222)
	com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:457)
	com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:32)
	com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:342)
	com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4917)
	com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3860)
	com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3828)
	com.spotify.scio.bigquery.types.package$Json$.parse(package.scala:76)
	@BigQueryType.toTable <== line where there is the toTable annotation

It did work with JacksonNode, but I wonder if it is possible with TableRow.

@RustedBones
Copy link
Contributor

Indeed, I overlooked the fact that a Json root can either be

  • a Json Object (TableRow)
  • a Json Array

@RustedBones RustedBones added the bug Something isn't working label Jan 15, 2025
@turb
Copy link
Contributor Author

turb commented Jan 16, 2025

Looking after it, now I remember why I came up with JacksonNode.

Beam is using a unmaintained BigQuery client. I could not find anything in it dealing with JSON fields, so I supposed it to be directly stored as a simple JSON element in its representation tree.

Also BigQuery doc states:

BigQuery supports the JSON type even if schema information is not known at the time of ingestion. A field that is declared as JSON type is loaded with the raw JSON values.

Beam does not seem to really care about it.

Thing is, google-api-services-bigquery relies on google-http-client JsonFactory, which is backed by either Jackson or Gson. I tried JacksonNode and worked like a charm.

I don't know what was the issue with it, but maybe a Gson JsonElement can do instead?

@RustedBones RustedBones linked a pull request Jan 16, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants