Py4JJavaError: An error occurred while calling o2951.fullAnnotateJava. #13903

Ededu1984 · 2023-07-26T14:50:02Z

Is there an existing issue for this?

I have searched the existing issues and did not find a match.

Who can help?

I'm trying to reproduce the code

https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/TRANSLATION_MARIAN.ipynb#scrollTo=EYf_9sXDXR4t

My code:

from sparknlp.annotator import SentenceDetectorDLModel, MarianTransformer

documentAssembler = DocumentAssembler().setInputCol("text").setOutputCol("document")

sentencerDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")
.setInputCols(["document"])
.setOutputCol("sentence")

marian = MarianTransformer.pretrained("opus_mt_it_en", "xx")
.setInputCols(["sentence"])
.setOutputCol("translation")

marian_pipeline = Pipeline(stages=[documentAssembler, sentencerDL, marian])
light_pipeline = LightPipeline(marian_pipeline.fit(spark.createDataFrame([[""]]).toDF("text")))

result = light_pipeline.fullAnnotate("""La Gioconda è un dipinto ad olio del XVI secolo creato da Leonardo. Si tiene al Louvre di Parigi.""")

The error

Py4JJavaError Traceback (most recent call last)
File :16
13 marian_pipeline = Pipeline(stages=[documentAssembler, sentencerDL, marian])
14 light_pipeline = LightPipeline(marian_pipeline.fit(spark.createDataFrame([[""]]).toDF("text")))
---> 16 result = light_pipeline.fullAnnotate("""La Gioconda è un dipinto ad olio del XVI secolo creato da Leonardo. Si tiene al Louvre di Parigi.""")

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-8bb6e45d-4a31-48b9-8f1c-a3e9553fddde/lib/python3.9/site-packages/sparknlp/base/light_pipeline.py:201, in LightPipeline.fullAnnotate(self, target, optional_target)
199 if optional_target == "":
200 if self.__isTextInput(target):
--> 201 result = self.__fullAnnotateText(target)
202 elif self.__isAudioInput(target):
203 result = self.__fullAnnotateAudio(target)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-8bb6e45d-4a31-48b9-8f1c-a3e9553fddde/lib/python3.9/site-packages/sparknlp/base/light_pipeline.py:243, in LightPipeline.__fullAnnotateText(self, target)
240 if type(target) is str:
241 target = [target]
--> 243 for annotations_result in self._lightPipeline.fullAnnotateJava(target):
244 result.append(self.__buildStages(annotations_result))
245 return result

File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321, in JavaMember.call(self, *args)
1315 command = proto.CALL_COMMAND_NAME +
1316 self.command_header +
1317 args_command +
1318 proto.END_COMMAND_PART
1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
1322 answer, self.gateway_client, self.target_id, self.name)
1324 for temp_arg in temp_args:
1325 temp_arg._detach()

File /databricks/spark/python/pyspark/errors/exceptions.py:228, in capture_sql_exception..deco(*a, **kw)
226 def deco(*a: Any, **kw: Any) -> Any:
227 try:
--> 228 return f(*a, **kw)
229 except Py4JJavaError as e:
230 converted = convert_exception(e.java_exception)

File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o2951.fullAnnotateJava.
: java.lang.ClassCastException: java.util.ArrayList cannot be cast to [Ljava.lang.String;
at com.johnsnowlabs.nlp.annotators.seq2seq.MarianTransformer.batchAnnotate(MarianTransformer.scala:352)
at com.johnsnowlabs.nlp.LightPipeline.processBatchedAnnotator(LightPipeline.scala:202)
at com.johnsnowlabs.nlp.LightPipeline.processAnnotatorModel(LightPipeline.scala:184)
at com.johnsnowlabs.nlp.LightPipeline.$anonfun$fullAnnotateInternal$1(LightPipeline.scala:118)
at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:198)
at com.johnsnowlabs.nlp.LightPipeline.fullAnnotateInternal(LightPipeline.scala:100)
at com.johnsnowlabs.nlp.LightPipeline.fullAnnotate(LightPipeline.scala:49)
at com.johnsnowlabs.nlp.LightPipeline.fullAnnotateJava(LightPipeline.scala:303)
at com.johnsnowlabs.nlp.LightPipeline.$anonfun$fullAnnotateJava$5(LightPipeline.scala:342)
at scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:659)
at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
at scala.collection.parallel.mutable.ParArray$Map.tryLeaf(ParArray.scala:650)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:153)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

I'm using spark-nlp on the Databricks environment.

Spark NLP version 4.2.8
Apache Spark version: 3.3.2

What are you working on?

Text translation

Current Behavior

Translate the text

Expected Behavior

Translate the text

Steps To Reproduce

I don't have the link

Spark NLP version and Apache Spark

Spark NLP version 4.2.8
Apache Spark version: 3.3.2

Type of Spark Application

spark-shell

Java Version

No response

Java Home Directory

No response

Setup and installation

No response

Operating System and Version

No response

Link to your project (if available)

No response

Additional Information

No response

maziyarpanahi · 2023-07-30T16:18:11Z

Hi @Ededu1984

This is a bug that we will fix in the next release

noga-eps · 2023-07-31T09:17:57Z

@maziyarpanahi Do you know when is the next release?

maziyarpanahi · 2023-07-31T09:21:32Z

@noga-eps we scheduled Spark NLP 5.0.2 release in 2-3 days. (100% by the end of this week)

Ededu1984 · 2023-08-11T14:07:06Z

Hi,
@maziyarpanahi
java.lang.ClassCastException: java.util.ArrayList cannot be cast to [Ljava.lang.String;

This is the code:

documentAssembler = DocumentAssembler().setInputCol("text").setOutputCol("document")

sentencerDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx").setInputCols(["document"]).setOutputCol("sentences")

marian = MarianTransformer.pretrained("opus_mt_mul_en", "xx")
.setInputCols(["sentences"])
.setOutputCol("translation")
.setLangId("deu")

sdf = spark.createDataFrame([[">>deu<< Hallo wie geht es dir Ich bin hubert aus Deutschland"],
[">>fra<< Wikipédia est un projet d'encyclopédie collective en ligne, universelle, multilingue et fonctionnant sur le principe du wiki. Ce projet vise à offrir un contenu librement réutilisable, objectif et vérifiable, que chacun peut modifier et améliorer."]]).toDF("text")

marian_pipeline = Pipeline(stages=[documentAssembler, sentencerDL, marian])
light_pipeline = LightPipeline(marian_pipeline.fit(sdf))

config on Databricks

SparkSession - hive

Version
v3.3.2

AppName
Databricks Shell

Spark NLP version 5.0.2
Apache Spark version: 3.3.2

The code works perfectly in Colab but I got this error when I try to execute on Databricks.

Runtime version
13.2 ML (includes Apache Spark 3.4.0, GPU, Scala 2.12)

maziyarpanahi · 2023-08-15T10:30:11Z

@Ededu1984 This is a typical mistake made by users in Databricks. They think changing PyPI version to 5.0.2 is enough. You must also change the Maven version to 5.0.2 as well.

https://github.com/JohnSnowLabs/spark-nlp#databricks-cluster (the step 3)

As you can see we no longer have that error in 5.0.2: https://colab.research.google.com/drive/126cHLX87KbkF3aAepri_AgtawLPLrUeg?usp=sharing

You just have to make sure your Maven (the actual core library) is also pointing to 5.0.2

Ededu1984 added the question label Jul 26, 2023

Ededu1984 assigned maziyarpanahi Jul 26, 2023

maziyarpanahi added bug fixed-next-release and removed question labels Jul 30, 2023

maziyarpanahi linked a pull request Jul 30, 2023 that will close this issue

SPARKNLP-873 Issue with MarianTransformers models #13908

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Py4JJavaError: An error occurred while calling o2951.fullAnnotateJava. #13903

Py4JJavaError: An error occurred while calling o2951.fullAnnotateJava. #13903

Ededu1984 commented Jul 26, 2023

maziyarpanahi commented Jul 30, 2023

noga-eps commented Jul 31, 2023

maziyarpanahi commented Jul 31, 2023

Ededu1984 commented Aug 11, 2023 •

edited

Loading

maziyarpanahi commented Aug 15, 2023

Py4JJavaError: An error occurred while calling o2951.fullAnnotateJava. #13903

Py4JJavaError: An error occurred while calling o2951.fullAnnotateJava. #13903

Comments

Ededu1984 commented Jul 26, 2023

Is there an existing issue for this?

Who can help?

What are you working on?

Current Behavior

Expected Behavior

Steps To Reproduce

Spark NLP version and Apache Spark

Type of Spark Application

Java Version

Java Home Directory

Setup and installation

Operating System and Version

Link to your project (if available)

Additional Information

maziyarpanahi commented Jul 30, 2023

noga-eps commented Jul 31, 2023

maziyarpanahi commented Jul 31, 2023

Ededu1984 commented Aug 11, 2023 • edited Loading

maziyarpanahi commented Aug 15, 2023

Ededu1984 commented Aug 11, 2023 •

edited

Loading