Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility test for reading impala parquet file from java is failing #6

Open
jaltekruse opened this issue Aug 14, 2013 · 4 comments

Comments

@jaltekruse
Copy link

The root cause of the error was in the snappy decompression step of the read. Stack trace can be found below.

Running the default tests in the compatibility module produces the error in TestImpalaCompatibility.

This seems to be the root error
Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
at org.xerial.snappy.Snappy.uncompress(Snappy.java:467)

The test was run on a macbook pro running OSX 10.8.3.
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b12)

Reading another file produced using impala we received the same error both on a mac and a liunx server running Red Hat 6.0
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.208 sec <<< FAILURE!
testReadFromImpala(parquet.compat.test.TestImpalaCompatibility) Time elapsed: 0.205 sec <<< ERROR!
parquet.io.ParquetDecodingException: Can not read value at 0 in block 0
at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:219)
at parquet.hadoop.ParquetReader.read(ParquetReader.java:75)
at parquet.compat.test.ConvertUtils.convertParquetToCSV(ConvertUtils.java:134)
at parquet.compat.test.TestImpalaCompatibility.testReadFromImpala(TestImpalaCompatibility.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:115)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103)
at com.sun.proxy.$Proxy0.invoke(Unknown Source)
at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150)
at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)
Caused by: parquet.io.ParquetDecodingException: could not read page Page [id: 577, bytes.size=65534, valueCount=1716, uncompressedSize=65534] in col [name] BINARY
at parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:497)
at parquet.column.impl.ColumnReaderImpl.getCurrentDefinitionLevel(ColumnReaderImpl.java:443)
at parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:387)
at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:215)
... 32 more
Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
at org.xerial.snappy.Snappy.uncompress(Snappy.java:467)
at parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:67)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:83)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:77)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:184)
at parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:488)
... 35 more

@aniket486
Copy link
Contributor

Thanks for reporting this. @nongli and I are working on this.

@jaltekruse
Copy link
Author

Has there been any progress on this front? Is there anything I could do to help out?

@aniket486
Copy link
Contributor

There was incompatibility between parquet-1.0 and impala for snappy. We have fixed it in 1.1. You can check out the latest parquet-compatibility code and mvn test should pass.

@seregasheypak
Copy link

Got the same problem:

Caused by: parquet.io.ParquetDecodingException: could not read page Page [id: 4409, bytes.size=9, valueCount=9046243, uncompressedSize=9] in col [actual_duration] INT64
at parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:546)
at parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:509)
at parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:560)
at parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:355)
at parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:63)
at parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:58)
at parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:265)
at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:59)
at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:73)
at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:110)
at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172)
... 14 more
Caused by: java.io.EOFException
at parquet.bytes.BytesUtils.readIntLittleEndianOnOneByte(BytesUtils.java:76)
at parquet.column.values.dictionary.DictionaryValuesReader.initFromPage(DictionaryValuesReader.java:56)
at parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:544)
... 24 more

Used impala 1.2.4 on CDH 4.6 to prepare file (INSERT INTO TABLE)
and read it using pig and Parquet stuff version 1.4.3

REGISTER hdfs:///applications/libs/parquet/1.4.3/parquet-avro-1.4.3.jar
REGISTER hdfs:///applications/libs/parquet/1.4.3/parquet-column-1.4.3.jar
REGISTER hdfs:///applications/libs/parquet/1.4.3/parquet-common-1.4.3.jar
REGISTER hdfs:///applications/libs/parquet/1.4.3/parquet-encoding-1.4.3.jar
REGISTER hdfs:///applications/libs/parquet/1.4.3/parquet-hadoop-1.4.3.jar
REGISTER hdfs:///applications/libs/parquet/1.4.3/parquet-pig-1.4.3.jar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants