-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility test for reading impala parquet file from java is failing #6
Comments
Thanks for reporting this. @nongli and I are working on this. |
Has there been any progress on this front? Is there anything I could do to help out? |
There was incompatibility between parquet-1.0 and impala for snappy. We have fixed it in 1.1. You can check out the latest parquet-compatibility code and mvn test should pass. |
Got the same problem:
REGISTER hdfs:///applications/libs/parquet/1.4.3/parquet-avro-1.4.3.jar
|
The root cause of the error was in the snappy decompression step of the read. Stack trace can be found below.
Running the default tests in the compatibility module produces the error in TestImpalaCompatibility.
This seems to be the root error
Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
at org.xerial.snappy.Snappy.uncompress(Snappy.java:467)
The test was run on a macbook pro running OSX 10.8.3.
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b12)
Reading another file produced using impala we received the same error both on a mac and a liunx server running Red Hat 6.0
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.208 sec <<< FAILURE!
testReadFromImpala(parquet.compat.test.TestImpalaCompatibility) Time elapsed: 0.205 sec <<< ERROR!
parquet.io.ParquetDecodingException: Can not read value at 0 in block 0
at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:219)
at parquet.hadoop.ParquetReader.read(ParquetReader.java:75)
at parquet.compat.test.ConvertUtils.convertParquetToCSV(ConvertUtils.java:134)
at parquet.compat.test.TestImpalaCompatibility.testReadFromImpala(TestImpalaCompatibility.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:115)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103)
at com.sun.proxy.$Proxy0.invoke(Unknown Source)
at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150)
at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)
Caused by: parquet.io.ParquetDecodingException: could not read page Page [id: 577, bytes.size=65534, valueCount=1716, uncompressedSize=65534] in col [name] BINARY
at parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:497)
at parquet.column.impl.ColumnReaderImpl.getCurrentDefinitionLevel(ColumnReaderImpl.java:443)
at parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:387)
at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:215)
... 32 more
Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
at org.xerial.snappy.Snappy.uncompress(Snappy.java:467)
at parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:67)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:83)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:77)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:184)
at parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:488)
... 35 more
The text was updated successfully, but these errors were encountered: