Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParquetType throws error writing Optional empty list #825

Open
clairemcginty opened this issue Aug 31, 2023 · 1 comment
Open

ParquetType throws error writing Optional empty list #825

clairemcginty opened this issue Aug 31, 2023 · 1 comment

Comments

@clairemcginty
Copy link
Contributor

clairemcginty commented Aug 31, 2023

ParquetType can write empty lists when the list field is top-level, but not when it's wrapped in an Option. Repro:

$ sbt parquet/test:console
scala> import magnolify.parquet._
scala> import magnolify.parquet.ParquetArray.AvroCompat._

scala> case class RegularList(f: List[Int])
scala> case class OptionalList(f: Option[List[Int]])

scala> val writerRegularList = ParquetType[RegularList].writeBuilder(new TestOutputFile()).build()
scala> val writerOptionalList = ParquetType[OptionalList].writeBuilder(new TestOutputFile()).build()

// Succeeds
scala> writerRegularList.write( RegularList(List()))

// Throws error
scala> writerOptionalList.write(OptionalList(Some(List()))
org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead
  at org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:329)
  at org.apache.parquet.io.RecordConsumerLoggingWrapper.endField(RecordConsumerLoggingWrapper.java:162)
  at magnolify.parquet.ParquetField$$anon$9.write(ParquetField.scala:335)
  at magnolify.parquet.ParquetField.writeGroup(ParquetField.scala:58)
  at magnolify.parquet.ParquetField.writeGroup$(ParquetField.scala:54)
  at magnolify.parquet.ParquetField$$anon$9.writeGroup(ParquetField.scala:309)
  at magnolify.parquet.ParquetField$$anon$7.$anonfun$write$2(ParquetField.scala:290)
  at magnolify.parquet.ParquetField$$anon$7.$anonfun$write$2$adapted(ParquetField.scala:290)
  at scala.Option.foreach(Option.scala:437)
  at magnolify.parquet.ParquetField$$anon$7.write(ParquetField.scala:290)
  at magnolify.parquet.ParquetField$$anon$7.write(ParquetField.scala:280)
  at magnolify.parquet.ParquetField.writeGroup(ParquetField.scala:58)
  at magnolify.parquet.ParquetField.writeGroup$(ParquetField.scala:54)
  at magnolify.parquet.ParquetField$$anon$7.writeGroup(ParquetField.scala:280)
  at magnolify.parquet.ParquetField$$anon$3.$anonfun$write$1(ParquetField.scala:134)
  at magnolify.parquet.ParquetField$$anon$3.$anonfun$write$1$adapted(ParquetField.scala:129)
  at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
  at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
  at magnolify.parquet.ParquetField$$anon$3.write(ParquetField.scala:129)
  at magnolify.parquet.ParquetType$$anon$1.write(ParquetType.scala:99)
  at magnolify.parquet.ParquetType$WriteSupport.write(ParquetType.scala:203)
  at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
  at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
  ... 59 elided

On the user side, the easiest solution is probably dropping the Option wrapper around the List, but worth taking a look at in Magnolify imo.

@RustedBones
Copy link
Contributor

Strange, I thought there was a check when transforming the field presence (from expected to REPEATED or OPTIONAL) and that it would throw when changing presence of an already REPEATED or OPTIONAL field here.

Looks the schema construction is not called in your case, but the intent was to not support such classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants