Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

500 error (OptimisticLockException) when uploading multiple tabular files #11265

Open
pdurbin opened this issue Feb 14, 2025 · 24 comments
Open

Comments

@pdurbin
Copy link
Member

pdurbin commented Feb 14, 2025

With @JR-1991 and I looking on, @kmika11 was able to reproduce the following issue on https://demo.dataverse.org running Dataverse 6.5 (technically "version":"6.5","build":"iqss-3") but she's having the same problem on Harvard Dataverse (which runs the same version).

Katie is using the latest release of python-dvuploader: v0.2.3.

Her notebook is here: https://github.com/kmika11/harvard-college-observatory-announcements/blob/32a0b9aa287eb20f0aeef5853ffec2bb1ec95131/curation_script_HCO.ipynb

The script creates a dataset ( https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/0HMSY2&version=DRAFT ) with a six files, three of which are tabular, before it unexpectedly stops uploading the remaining files from the list of 60 total files:

Image

Here's a screenshot of the error Katie sees in here notebook:

Image

Expand the section below to see the traceback in the notebook.

Notebook traceback
ClientResponseError                       Traceback (most recent call last)
Cell In[17], line 5
      2 collection_pids = {}
      4 for dataset_name, dataset in test_set:
----> 5     pid = dataset.upload(dataverse_name = g_dataverse_collection, n_parallel = 4)
      6     collection_pids[dataset_name] = pid

File ~/anaconda3/envs/curation/lib/python3.12/site-packages/easyDataverse/dataset.py:216, in Dataset.upload(self, dataverse_name, n_parallel)
    204 """Uploads a given dataset to a Dataverse installation specified in the environment variable.
    205
    206 Args:
   (...)
    211     str: The identifier of the uploaded dataset.
    212 """
    214 self._validate_required_fields()
--> 216 self.p_id = upload_to_dataverse(
    217     json_data=self.dataverse_json(),
    218     dataverse_name=dataverse_name,
    219     files=self.files,
    220     p_id=self.p_id,
    221     DATAVERSE_URL=str(self.DATAVERSE_URL),
    222     API_TOKEN=str(self.API_TOKEN),
    223     n_parallel=n_parallel,
    224 )
    226 return self.p_id

File ~/anaconda3/envs/curation/lib/python3.12/site-packages/easyDataverse/uploader.py:59, in upload_to_dataverse(json_data, dataverse_name, files, p_id, n_parallel, DATAVERSE_URL, API_TOKEN)
     56 # Get response data
     57 p_id = response.json()["data"]["persistentId"]
---> 59 _uploadFiles(
     60     files=files,
     61     p_id=p_id,
     62     api=api,
     63     n_parallel=n_parallel,
     64 )  # type: ignore
     66 console = Console()
     67 url = urljoin(DATAVERSE_URL, f"dataset.xhtml?persistentId={p_id}")

File ~/anaconda3/envs/curation/lib/python3.12/site-packages/easyDataverse/uploader.py:107, in _uploadFiles(files, p_id, api, n_parallel)
    104     return
    106 dvuploader = DVUploader(files=files)
--> 107 dvuploader.upload(
    108     persistent_id=p_id,
    109     dataverse_url=api.base_url,
    110     api_token=api.api_token,
    111     n_parallel_uploads=n_parallel,
    112 )

File ~/anaconda3/envs/curation/lib/python3.12/site-packages/dvuploader/dvuploader.py:133, in DVUploader.upload(self, persistent_id, dataverse_url, api_token, n_parallel_uploads, force_native)
    131 else:
    132     with progress:
--> 133         asyncio.run(
    134             direct_upload(
    135                 files=files,
    136                 dataverse_url=dataverse_url,
    137                 api_token=api_token,
    138                 persistent_id=persistent_id,
    139                 pbars=pbars,
    140                 progress=progress,
    141                 n_parallel_uploads=n_parallel_uploads,
    142             )
    143         )
    145 if self.verbose:
    146     rich.print("\n[bold italic white]✅ Upload complete\n")

File ~/anaconda3/envs/curation/lib/python3.12/site-packages/nest_asyncio.py:30, in _patch_asyncio.<locals>.run(main, debug)
     28 task = asyncio.ensure_future(main)
     29 try:
---> 30     return loop.run_until_complete(task)
     31 finally:
     32     if not task.done():

File ~/anaconda3/envs/curation/lib/python3.12/site-packages/nest_asyncio.py:98, in _patch_loop.<locals>.run_until_complete(self, future)
     95 if not f.done():
     96     raise RuntimeError(
     97         'Event loop stopped before Future completed.')
---> 98 return f.result()

File ~/anaconda3/envs/curation/lib/python3.12/asyncio/futures.py:203, in Future.result(self)
    201 self.__log_traceback = False
    202 if self._exception is not None:
--> 203     raise self._exception.with_traceback(self._exception_tb)
    204 return self._result

File ~/anaconda3/envs/curation/lib/python3.12/asyncio/tasks.py:314, in Task.__step_run_and_handle_result(***failed resolving arguments***)
    310 try:
    311     if exc is None:
    312         # We use the `send` method directly, because coroutines
    313         # don't have `__iter__` and `__next__` methods.
--> 314         result = coro.send(None)
    315     else:
    316         result = coro.throw(exc)

File ~/anaconda3/envs/curation/lib/python3.12/site-packages/dvuploader/directupload.py:91, in direct_upload(files, dataverse_url, api_token, persistent_id, progress, pbars, n_parallel_uploads)
     86 connector = aiohttp.TCPConnector(limit=2)
     87 async with aiohttp.ClientSession(
     88     headers=headers,
     89     connector=connector,
     90 ) as session:
---> 91     await _add_files_to_ds(
     92         session=session,
     93         files=files,
     94         dataverse_url=dataverse_url,
     95         pid=persistent_id,
     96         progress=progress,
     97         pbar=pbar,
     98     )

File ~/anaconda3/envs/curation/lib/python3.12/site-packages/dvuploader/directupload.py:520, in _add_files_to_ds(session, dataverse_url, pid, files, progress, pbar)
    512 replace_json_data = _prepare_registration(files, use_replace=True)
    514 await _multipart_json_data_request(
    515     session=session,
    516     json_data=novel_json_data,
    517     url=novel_url,
    518 )
--> 520 await _multipart_json_data_request(
    521     session=session,
    522     json_data=replace_json_data,
    523     url=replace_url,
    524 )
    526 progress.update(pbar, advance=1)

File ~/anaconda3/envs/curation/lib/python3.12/site-packages/dvuploader/directupload.py:581, in _multipart_json_data_request(json_data, url, session)
    578 json_part.set_content_disposition("form-data", name="jsonData")
    580 async with session.post(url, data=writer) as response:
--> 581     response.raise_for_status()

File ~/anaconda3/envs/curation/lib/python3.12/site-packages/aiohttp/client_reqrep.py:1070, in ClientResponse.raise_for_status(self)
   1068 assert self.reason is not None
   1069 self.release()
-> 1070 raise ClientResponseError(
   1071     self.request_info,
   1072     self.history,
   1073     status=self.status,
   1074     message=self.reason,
   1075     headers=self.headers,
   1076 )

ClientResponseError: 500, message='Internal Server Error', url=URL('https://demo.dataverse.org/api/datasets/:persistentId/replaceFiles?persistentId=doi:10.70122/FK2/0HMSY2')

On the backend, we get a 500 error because of an OptimisticLockException:

jakarta.ejb.EJBException: Exception [EclipseLink-5010] (Eclipse Persistence Services - 4.0.1.payara-p2.v202310250827): org.eclipse.persistence.exceptions.OptimisticLockException Exception Description: The object [[DatasetVersion id:283666]] cannot be merged because it has changed or been deleted since it was last read. Class> edu.harvard.iq.dataverse.DatasetVersion

The full OptimisticLockException and surrounding stacktrace is below.

500 error (OptimisticLockException)
[2025-02-07T12:49:50.966-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950590966] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:49:50.966-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950590966] [levelValue: 800] [[
  jsonData: {"description":"File contains OCR text with data from:  Announcement Card number: 1","directoryLabel":"","mimeType":"text/plain","categories":["Data","BLATHWAYT’S COMET"],"restrict":false,"storageIdentifier":"s3dir://iqssqa:194e1883c68-ef4a116c5802","fileName":"HCOAnnouncement0001_0001_b.innodata.csv","checksum":{"@type":"MD5","@value":"cb9610708db24d317c9c9e6dd28352f7"}}]]

[2025-02-07T12:49:50.966-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950590966] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:49:50.968-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.util.FileUtil] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950590968] [levelValue: 800] [[
  HCOAnnouncement0001_0001_b.innodata.csv is a filename/extension Dataverse doesn't know about. Consider adding it to the MimeTypeDetectionByFileName.properties file.]]

[2025-02-07T12:49:51.018-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591018] [levelValue: 800] [[
  directly uploaded file successfully saved. file size: 173]]

[2025-02-07T12:49:51.020-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591020] [levelValue: 800] [[
  Incrementing recorded storage use by 173 bytes for dataset 2527473]]

[2025-02-07T12:49:51.025-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591025] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:49:51.025-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591025] [levelValue: 800] [[
  jsonData: {"description":"File contains OCR text with data from:  Announcement Card number: 1","directoryLabel":"","mimeType":"text/plain","categories":["Data","BLATHWAYT’S COMET"],"restrict":false,"storageIdentifier":"s3dir://iqssqa:194e1883c68-5a9bb8720c6e","fileName":"HCOAnnouncement0001_0001_a.innodata.csv","checksum":{"@type":"MD5","@value":"93f66e7af04588d68170f3ab68eeed90"}}]]

[2025-02-07T12:49:51.025-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591025] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:49:51.027-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.util.FileUtil] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591027] [levelValue: 800] [[
  HCOAnnouncement0001_0001_a.innodata.csv is a filename/extension Dataverse doesn't know about. Consider adding it to the MimeTypeDetectionByFileName.properties file.]]

[2025-02-07T12:49:51.062-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591062] [levelValue: 800] [[
  directly uploaded file successfully saved. file size: 471]]

[2025-02-07T12:49:51.063-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591063] [levelValue: 800] [[
  Incrementing recorded storage use by 471 bytes for dataset 2527473]]

[2025-02-07T12:49:51.067-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591067] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:49:51.067-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591067] [levelValue: 800] [[
  jsonData: {"description":"File contains OCR text with data from:  Announcement Card number: 1","directoryLabel":"","mimeType":"text/plain","categories":["Data"],"restrict":false,"storageIdentifier":"s3dir://iqssqa:194e1883c69-2dcf02db2e11","fileName":"HCOAnnouncement0001_0001.innodata.txt","checksum":{"@type":"MD5","@value":"6ff9b3f87b23de88c3e816b861df7e08"}}]]

[2025-02-07T12:49:51.067-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591067] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:49:51.069-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.util.FileUtil] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591069] [levelValue: 800] [[
  HCOAnnouncement0001_0001.innodata.txt is a filename/extension Dataverse doesn't know about. Consider adding it to the MimeTypeDetectionByFileName.properties file.]]

[2025-02-07T12:49:51.104-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591104] [levelValue: 800] [[
  directly uploaded file successfully saved. file size: 870]]

[2025-02-07T12:49:51.104-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591104] [levelValue: 800] [[
  Incrementing recorded storage use by 870 bytes for dataset 2527473]]

[2025-02-07T12:49:51.109-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591109] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:49:51.109-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591109] [levelValue: 800] [[
  jsonData: {"description":"File contains OCR text with data from:  Announcement Card number: 1","directoryLabel":"","mimeType":"text/plain","categories":["Data"],"restrict":false,"storageIdentifier":"s3dir://iqssqa:194e1883c64-ae29a10f3e39","fileName":"HCOAnnouncement0001_0001.innodata.xml","checksum":{"@type":"MD5","@value":"bedeeec6772675d1f31b309813df4d08"}}]]

[2025-02-07T12:49:51.109-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591109] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:49:51.111-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.util.FileUtil] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591111] [levelValue: 800] [[
  HCOAnnouncement0001_0001.innodata.xml is a filename/extension Dataverse doesn't know about. Consider adding it to the MimeTypeDetectionByFileName.properties file.]]

[2025-02-07T12:49:51.143-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591143] [levelValue: 800] [[
  directly uploaded file successfully saved. file size: 24597]]

[2025-02-07T12:49:51.143-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591143] [levelValue: 800] [[
  Incrementing recorded storage use by 24597 bytes for dataset 2527473]]

[2025-02-07T12:49:51.147-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591147] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:49:51.147-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591147] [levelValue: 800] [[
  jsonData: {"description":"File contains OCR text with data from:  Announcement Card number: 1","directoryLabel":"","mimeType":"text/plain","categories":["Data"],"restrict":false,"storageIdentifier":"s3dir://iqssqa:194e1883cb7-49c54148b0f6","fileName":"HCOAnnouncement0001_0001.innodata.jpg","checksum":{"@type":"MD5","@value":"715234dc22d6dcaa3e2c17beca906e56"}}]]

[2025-02-07T12:49:51.147-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591147] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:49:51.149-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.util.FileUtil] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591149] [levelValue: 800] [[
  HCOAnnouncement0001_0001.innodata.jpg is a filename/extension Dataverse doesn't know about. Consider adding it to the MimeTypeDetectionByFileName.properties file.]]

[2025-02-07T12:49:51.182-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591182] [levelValue: 800] [[
  directly uploaded file successfully saved. file size: 186234]]

[2025-02-07T12:49:51.183-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591183] [levelValue: 800] [[
  Incrementing recorded storage use by 186234 bytes for dataset 2527473]]

[2025-02-07T12:49:51.187-0500] [Payara 6.2024.6] [WARNING] [] [edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950591187] [levelValue: 900] [[
  Dataset not locked for EditInProgress ]]

[2025-02-07T12:49:54.048-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1738950594048] [levelValue: 800] [[
  Ingest of 2 tabular data file(s) is in progress.]]

[2025-02-07T12:49:54.118-0500] [Payara 6.2024.6] [WARNING] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=2877 _ThreadName=orb-thread-pool-1 (pool #1): worker-4] [timeMillis: 1738950594118] [levelValue: 900] [[
  Ingest failure (IO Exception): Invalid header row. One of the cells is empty..]]

[2025-02-07T12:49:54.121-0500] [Payara 6.2024.6] [WARNING] [] [edu.harvard.iq.dataverse.ingest.IngestMessageBean] [tid: _ThreadID=2877 _ThreadName=orb-thread-pool-1 (pool #1): worker-4] [timeMillis: 1738950594121] [levelValue: 900] [[
  Error occurred during ingest job for file id 2527478!]]

[2025-02-07T12:49:54.196-0500] [Payara 6.2024.6] [WARNING] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=2877 _ThreadName=orb-thread-pool-1 (pool #1): worker-4] [timeMillis: 1738950594196] [levelValue: 900] [[
  Ingest failure (Exception class java.lang.IllegalArgumentException): The header contains a duplicate name: "Right Ascension" in [1926, Right Ascension, Right Ascension, Right Ascension, Declination, Declination, log Δ, Brightness].]]

[2025-02-07T12:49:54.198-0500] [Payara 6.2024.6] [WARNING] [] [edu.harvard.iq.dataverse.ingest.IngestMessageBean] [tid: _ThreadID=2877 _ThreadName=orb-thread-pool-1 (pool #1): worker-4] [timeMillis: 1738950594198] [levelValue: 900] [[
  Error occurred during ingest job for file id 2527477!]]

[2025-02-07T12:49:54.618-0500] [Payara 6.2024.6] [WARNING] [] [edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper] [tid: _ThreadID=91 _ThreadName=http-thread-pool::jk-connector(1)] [timeMillis: 1738950594618] [levelValue: 900] [[
  Dataset not locked for EditInProgress ]]

[2025-02-07T12:49:56.722-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.search.SearchServiceBean] [tid: _ThreadID=93 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1738950596722] [levelValue: 800] [[
  The sub-query q1 set to discoverableBy:(group_388732-PTTP OR group_1-allusers)]]

[2025-02-07T12:49:56.722-0500] [Payara 6.2024.6] [FINE] [] [edu.harvard.iq.dataverse.search.SearchServiceBean] [tid: _ThreadID=93 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1738950596722] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.search.SearchServiceBean] [METHODNAME: getPermissionFilterQuery] [[
  Returning experimental query: publicObject_b:true OR {!join from=definitionPointDocId to=id v=$q1}]]

[2025-02-07T12:49:56.722-0500] [Payara 6.2024.6] [FINE] [] [edu.harvard.iq.dataverse.search.SearchServiceBean] [tid: _ThreadID=93 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1738950596722] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.search.SearchServiceBean] [METHODNAME: search] [[

[2025-02-07T12:49:56.736-0500] [Payara 6.2024.6] [FINE] [] [edu.harvard.iq.dataverse.search.SearchServiceBean] [tid: _ThreadID=93 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1738950596736] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.search.SearchServiceBean] [METHODNAME: search] [[
  status code of the query response: 0]]

[2025-02-07T12:49:56.736-0500] [Payara 6.2024.6] [FINE] [] [edu.harvard.iq.dataverse.search.SearchServiceBean] [tid: _ThreadID=93 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1738950596736] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.search.SearchServiceBean] [METHODNAME: search] [[
  _size from query response: 4]]

[2025-02-07T12:49:56.736-0500] [Payara 6.2024.6] [FINE] [] [edu.harvard.iq.dataverse.search.SearchServiceBean] [tid: _ThreadID=93 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1738950596736] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.search.SearchServiceBean] [METHODNAME: search] [[
  qtime: 12]]

[2025-02-07T12:49:56.737-0500] [Payara 6.2024.6] [FINE] [] [edu.harvard.iq.dataverse.search.SearchServiceBean] [tid: _ThreadID=93 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1738950596737] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.search.SearchServiceBean] [METHODNAME: search] [[
  score for dataset_2527438: 385.56293]]

[2025-02-07T12:49:56.738-0500] [Payara 6.2024.6] [FINE] [] [edu.harvard.iq.dataverse.search.SearchServiceBean] [tid: _ThreadID=93 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1738950596738] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.search.SearchServiceBean] [METHODNAME: search] [[
  score for dataset_2433661: 385.56293]]

[2025-02-07T12:49:56.745-0500] [Payara 6.2024.6] [FINE] [] [edu.harvard.iq.dataverse.search.SearchServiceBean] [tid: _ThreadID=93 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1738950596745] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.search.SearchServiceBean] [METHODNAME: search] [[
  actual filter queries: [license:"CC0 1.0", subject_ss:"Agricultural Sciences", authorAffiliation_ss:"Harvard University", dvObjectType:(dataverses OR datasets OR files), publicObject_b:true OR {!join from=definitionPointDocId to=id v=$q1}]]]

[2025-02-07T12:50:00.216-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600216] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:50:00.216-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600216] [levelValue: 800] [[
  jsonData: {"description":"File contains OCR text with data from:  Announcement Card number: 2","directoryLabel":"","mimeType":"text/plain","categories":["Data","BLATHWAYT’S COMET"],"restrict":false,"storageIdentifier":"s3dir://iqssqa:194e1885b54-dca49d5d5856","fileName":"HCOAnnouncement0001_0002_c.innodata.csv","checksum":{"@type":"MD5","@value":"aefb4e5ba2a1f6ad2763afbcb4297cad"}}]]

[2025-02-07T12:50:00.217-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600217] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:50:00.218-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.util.FileUtil] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600218] [levelValue: 800] [[
  HCOAnnouncement0001_0002_c.innodata.csv is a filename/extension Dataverse doesn't know about. Consider adding it to the MimeTypeDetectionByFileName.properties file.]]

[2025-02-07T12:50:00.260-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600260] [levelValue: 800] [[
  directly uploaded file successfully saved. file size: 30]]

[2025-02-07T12:50:00.262-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600262] [levelValue: 800] [[
  Incrementing recorded storage use by 30 bytes for dataset 2527479]]

[2025-02-07T12:50:00.267-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600267] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:50:00.267-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600267] [levelValue: 800] [[
  jsonData: {"description":"File contains OCR text with data from:  Announcement Card number: 2","directoryLabel":"","mimeType":"text/plain","categories":["Data","ENSOR'S COMET"],"restrict":false,"storageIdentifier":"s3dir://iqssqa:194e1885b6b-74110ab3696b","fileName":"HCOAnnouncement0001_0002_b.innodata.csv","checksum":{"@type":"MD5","@value":"4ba2c4a349f8f6bfb1f11497573b4009"}}]]

[2025-02-07T12:50:00.267-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600267] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:50:00.269-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.util.FileUtil] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600269] [levelValue: 800] [[
  HCOAnnouncement0001_0002_b.innodata.csv is a filename/extension Dataverse doesn't know about. Consider adding it to the MimeTypeDetectionByFileName.properties file.]]

[2025-02-07T12:50:00.299-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600299] [levelValue: 800] [[
  directly uploaded file successfully saved. file size: 34]]

[2025-02-07T12:50:00.299-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600299] [levelValue: 800] [[
  Incrementing recorded storage use by 34 bytes for dataset 2527479]]

[2025-02-07T12:50:00.304-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600304] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:50:00.304-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600304] [levelValue: 800] [[
  jsonData: {"description":"File contains OCR text with data from:  Announcement Card number: 2","directoryLabel":"","mimeType":"text/plain","categories":["Data","ENSOR'S COMET"],"restrict":false,"storageIdentifier":"s3dir://iqssqa:194e1885b6a-2888b66ea339","fileName":"HCOAnnouncement0001_0002_a.innodata.csv","checksum":{"@type":"MD5","@value":"c5c6893ccdbcd95715547e472f70fcdf"}}]]

[2025-02-07T12:50:00.304-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600304] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:50:00.306-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.util.FileUtil] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600306] [levelValue: 800] [[
  HCOAnnouncement0001_0002_a.innodata.csv is a filename/extension Dataverse doesn't know about. Consider adding it to the MimeTypeDetectionByFileName.properties file.]]

[2025-02-07T12:50:00.347-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600347] [levelValue: 800] [[
  directly uploaded file successfully saved. file size: 140]]

[2025-02-07T12:50:00.348-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600348] [levelValue: 800] [[
  Incrementing recorded storage use by 140 bytes for dataset 2527479]]

[2025-02-07T12:50:00.353-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600353] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:50:00.354-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600354] [levelValue: 800] [[
  jsonData: {"description":"File contains OCR text with data from:  Announcement Card number: 2","directoryLabel":"","mimeType":"text/plain","categories":["Data"],"restrict":false,"storageIdentifier":"s3dir://iqssqa:194e1885b54-ee9e0f43b846","fileName":"HCOAnnouncement0001_0002.innodata.txt","checksum":{"@type":"MD5","@value":"00c4a63f7bf9e9f3e6f953215d7a3b0b"}}]]

[2025-02-07T12:50:00.354-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600354] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:50:00.356-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.util.FileUtil] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600356] [levelValue: 800] [[
  HCOAnnouncement0001_0002.innodata.txt is a filename/extension Dataverse doesn't know about. Consider adding it to the MimeTypeDetectionByFileName.properties file.]]

[2025-02-07T12:50:00.425-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600425] [levelValue: 800] [[
  directly uploaded file successfully saved. file size: 1735]]

[2025-02-07T12:50:00.426-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600426] [levelValue: 800] [[
  Incrementing recorded storage use by 1735 bytes for dataset 2527479]]

[2025-02-07T12:50:00.431-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600431] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:50:00.431-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600431] [levelValue: 800] [[
  jsonData: {"description":"File contains OCR text with data from:  Announcement Card number: 2","directoryLabel":"","mimeType":"text/plain","categories":["Data"],"restrict":false,"storageIdentifier":"s3dir://iqssqa:194e1885c9e-1262f25a19a3","fileName":"HCOAnnouncement0001_0002.innodata.xml","checksum":{"@type":"MD5","@value":"73dd1676dfcd523c9a5be3acabd68155"}}]]

[2025-02-07T12:50:00.431-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600431] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:50:00.433-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.util.FileUtil] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600433] [levelValue: 800] [[
  HCOAnnouncement0001_0002.innodata.xml is a filename/extension Dataverse doesn't know about. Consider adding it to the MimeTypeDetectionByFileName.properties file.]]

[2025-02-07T12:50:00.466-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600466] [levelValue: 800] [[
  directly uploaded file successfully saved. file size: 48933]]

[2025-02-07T12:50:00.466-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600466] [levelValue: 800] [[
  Incrementing recorded storage use by 48933 bytes for dataset 2527479]]

[2025-02-07T12:50:00.471-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600471] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:50:00.471-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600471] [levelValue: 800] [[
  jsonData: {"description":"File contains OCR text with data from:  Announcement Card number: 2","directoryLabel":"","mimeType":"text/plain","categories":["Data"],"restrict":false,"storageIdentifier":"s3dir://iqssqa:194e1885c9f-312f78a787d1","fileName":"HCOAnnouncement0001_0002.innodata.jpg","checksum":{"@type":"MD5","@value":"3c9e3f455df9255b26c68b7c71e26325"}}]]

[2025-02-07T12:50:00.471-0500] [Payara 6.2024.6] [INFO] [] [] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600471] [levelValue: 800] [[
  -------------------------------]]

[2025-02-07T12:50:00.472-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.util.FileUtil] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600472] [levelValue: 800] [[
  HCOAnnouncement0001_0002.innodata.jpg is a filename/extension Dataverse doesn't know about. Consider adding it to the MimeTypeDetectionByFileName.properties file.]]

[2025-02-07T12:50:00.491-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.timer.DataverseTimerServiceBean] [tid: _ThreadID=262 _ThreadName=__ejb-thread-pool2] [timeMillis: 1738950600491] [levelValue: 800] [[
  Removing existing harvest timers..]]

[2025-02-07T12:50:00.492-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.timer.DataverseTimerServiceBean] [tid: _ThreadID=262 _ThreadName=__ejb-thread-pool2] [timeMillis: 1738950600492] [levelValue: 800] [[
  HarvesterService: checking timer 1]]

[2025-02-07T12:50:00.492-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.timer.DataverseTimerServiceBean] [tid: _ThreadID=262 _ThreadName=__ejb-thread-pool2] [timeMillis: 1738950600492] [levelValue: 800] [[
  HarvesterService: checking timer 2]]

[2025-02-07T12:50:00.509-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600509] [levelValue: 800] [[
  directly uploaded file successfully saved. file size: 211376]]

[2025-02-07T12:50:00.510-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600510] [levelValue: 800] [[
  Incrementing recorded storage use by 211376 bytes for dataset 2527479]]

[2025-02-07T12:50:00.514-0500] [Payara 6.2024.6] [WARNING] [] [edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950600514] [levelValue: 900] [[
  Dataset not locked for EditInProgress ]]

[2025-02-07T12:50:03.879-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=92 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1738950603879] [levelValue: 800] [[
  Ingest of 3 tabular data file(s) is in progress.]]

[2025-02-07T12:50:03.941-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=2878 _ThreadName=orb-thread-pool-1 (pool #1): worker-5] [timeMillis: 1738950603941] [levelValue: 800] [[
  Tabular data successfully ingested; DataTable with 2 variables produced.]]

[2025-02-07T12:50:03.941-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=2878 _ThreadName=orb-thread-pool-1 (pool #1): worker-5] [timeMillis: 1738950603941] [levelValue: 800] [[
  Tab-delimited file produced: /tmp/data-7917177489414053181.tab]]

[2025-02-07T12:50:03.967-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.api.Datasets] [tid: _ThreadID=91 _ThreadName=http-thread-pool::jk-connector(1)] [timeMillis: 1738950603967] [levelValue: 800] [[
  edu.harvard.iq.dataverse.DatasetLock[ id=570247 ]]]

[2025-02-07T12:50:03.993-0500] [Payara 6.2024.6] [WARNING] [] [edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper] [tid: _ThreadID=91 _ThreadName=http-thread-pool::jk-connector(1)] [timeMillis: 1738950603993] [levelValue: 900] [[
  Dataset not locked for EditInProgress ]]

[2025-02-07T12:50:04.068-0500] [Payara 6.2024.6] [WARNING] [AS-EJB-00056] [jakarta.enterprise.ejb.container] [tid: _ThreadID=91 _ThreadName=http-thread-pool::jk-connector(1)] [timeMillis: 1738950604068] [levelValue: 900] [[
  A system exception occurred during an invocation on EJB EjbDataverseEngineInner, method: public java.lang.Object edu.harvard.iq.dataverse.EjbDataverseEngineInner.submit(edu.harvard.iq.dataverse.engine.command.Command,edu.harvard.iq.dataverse.engine.command.CommandContext) throws edu.harvard.iq.dataverse.engine.command.exception.CommandException]]

[2025-02-07T12:50:04.068-0500] [Payara 6.2024.6] [WARNING] [] [jakarta.enterprise.ejb.container] [tid: _ThreadID=91 _ThreadName=http-thread-pool::jk-connector(1)] [timeMillis: 1738950604068] [levelValue: 900] [[

jakarta.ejb.EJBException: Exception [EclipseLink-5010] (Eclipse Persistence Services - 4.0.1.payara-p2.v202310250827): org.eclipse.persistence.exceptions.OptimisticLockException
Exception Description: The object [[DatasetVersion id:283666]] cannot be merged because it has changed or been deleted since it was last read.
Class> edu.harvard.iq.dataverse.DatasetVersion
	at com.sun.ejb.containers.EJBContainerTransactionManager.processSystemException(EJBContainerTransactionManager.java:723)
	at com.sun.ejb.containers.EJBContainerTransactionManager.completeNewTx(EJBContainerTransactionManager.java:652)
	at com.sun.ejb.containers.EJBContainerTransactionManager.postInvokeTx(EJBContainerTransactionManager.java:482)
	at com.sun.ejb.containers.BaseContainer.postInvokeTx(BaseContainer.java:4601)
	at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2134)
	at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2104)
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:220)
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:90)
	at jdk.proxy102/jdk.proxy102.$Proxy613.submit(Unknown Source)
	at edu.harvard.iq.dataverse.__EJB31_Generated__EjbDataverseEngineInner__Intf____Bean__.submit(Unknown Source)
	at edu.harvard.iq.dataverse.EjbDataverseEngine.submit(EjbDataverseEngine.java:283)
	at jdk.internal.reflect.GeneratedMethodAccessor2540.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:588)
	at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:408)
	at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4835)
	at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:653)
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:834)
	at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:603)
	at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:163)
	at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:140)
	at jdk.internal.reflect.GeneratedMethodAccessor214.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888)
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833)
	at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:603)
	at org.jboss.weld.module.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:72)
	at org.jboss.weld.module.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52)
	at jdk.internal.reflect.GeneratedMethodAccessor212.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888)
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833)
	at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:375)
	at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4807)
	at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4795)
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:212)
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:90)
	at jdk.proxy102/jdk.proxy102.$Proxy617.submit(Unknown Source)
	at edu.harvard.iq.dataverse.__EJB31_Generated__EjbDataverseEngine__Intf____Bean__.submit(Unknown Source)
	at edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper.replaceFiles(AddReplaceFileHelper.java:2326)
	at edu.harvard.iq.dataverse.api.Datasets.replaceFilesInDataset(Datasets.java:4609)
	at edu.harvard.iq.dataverse.api.Datasets$Proxy$_$$_WeldSubclass.replaceFilesInDataset(Unknown Source)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189)
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:261)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:240)
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:697)
	at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:357)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:311)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
	at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1554)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:331)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
	at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:226)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
	at edu.harvard.iq.dataverse.api.ApiBlockingFilter.doFilter(ApiBlockingFilter.java:170)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
	at edu.harvard.iq.dataverse.api.ApiRouter.doFilter(ApiRouter.java:31)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
	at org.apache.catalina.core.ApplicationDispatcher.doInvoke(ApplicationDispatcher.java:816)
	at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:683)
	at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:527)
	at org.apache.catalina.core.ApplicationDispatcher.doDispatch(ApplicationDispatcher.java:497)
	at org.apache.catalina.core.ApplicationDispatcher.dispatch(ApplicationDispatcher.java:379)
	at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:329)
	at edu.harvard.iq.dataverse.api.ApiRouter.doFilter(ApiRouter.java:35)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:257)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:166)
	at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:757)
	at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:577)
	at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:158)
	at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:757)
	at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:577)
	at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:369)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:239)
	at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:520)
	at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:217)
	at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:174)
	at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:153)
	at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:196)
	at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:88)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:246)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:178)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:118)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:96)
	at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:51)
	at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:510)
	at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:82)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:83)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:101)
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:535)
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:515)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: jakarta.persistence.OptimisticLockException: Exception [EclipseLink-5010] (Eclipse Persistence Services - 4.0.1.payara-p2.v202310250827): org.eclipse.persistence.exceptions.OptimisticLockException
Exception Description: The object [[DatasetVersion id:283666]] cannot be merged because it has changed or been deleted since it was last read.
Class> edu.harvard.iq.dataverse.DatasetVersion
	at org.eclipse.persistence.internal.jpa.EntityManagerImpl.mergeInternal(EntityManagerImpl.java:646)
	at org.eclipse.persistence.internal.jpa.EntityManagerImpl.merge(EntityManagerImpl.java:622)
	at com.sun.enterprise.container.common.impl.EntityManagerWrapper.merge(EntityManagerWrapper.java:307)
	at edu.harvard.iq.dataverse.engine.command.impl.UpdateDatasetVersionCommand.execute(UpdateDatasetVersionCommand.java:150)
	at edu.harvard.iq.dataverse.engine.command.impl.UpdateDatasetVersionCommand.execute(UpdateDatasetVersionCommand.java:31)
	at edu.harvard.iq.dataverse.EjbDataverseEngineInner.submit(EjbDataverseEngineInner.java:36)
	at jdk.internal.reflect.GeneratedMethodAccessor2542.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:588)
	at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:408)
	at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4835)
	at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:653)
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:834)
	at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:603)
	at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:163)
	at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:140)
	at jdk.internal.reflect.GeneratedMethodAccessor214.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888)
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833)
	at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:603)
	at org.jboss.weld.module.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:72)
	at org.jboss.weld.module.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52)
	at jdk.internal.reflect.GeneratedMethodAccessor212.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888)
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833)
	at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:375)
	at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4807)
	at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4795)
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:212)
	... 113 more
Caused by: Exception [EclipseLink-5010] (Eclipse Persistence Services - 4.0.1.payara-p2.v202310250827): org.eclipse.persistence.exceptions.OptimisticLockException
Exception Description: The object [[DatasetVersion id:283666]] cannot be merged because it has changed or been deleted since it was last read.
Class> edu.harvard.iq.dataverse.DatasetVersion
	at org.eclipse.persistence.exceptions.OptimisticLockException.objectChangedSinceLastMerge(OptimisticLockException.java:154)
	at org.eclipse.persistence.internal.sessions.MergeManager.mergeChangesOfCloneIntoWorkingCopy(MergeManager.java:596)
	at org.eclipse.persistence.internal.sessions.MergeManager.mergeChanges(MergeManager.java:324)
	at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.mergeCloneWithReferences(UnitOfWorkImpl.java:3661)
	at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.mergeCloneWithReferences(RepeatableWriteUnitOfWork.java:402)
	at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.mergeCloneWithReferences(UnitOfWorkImpl.java:3621)
	at org.eclipse.persistence.internal.jpa.EntityManagerImpl.mergeInternal(EntityManagerImpl.java:644)
	... 146 more
]]

[2025-02-07T12:50:04.081-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=2878 _ThreadName=orb-thread-pool-1 (pool #1): worker-5] [timeMillis: 1738950604081] [levelValue: 800] [[
  Tabular data successfully ingested; DataTable with 2 variables produced.]]

[2025-02-07T12:50:04.081-0500] [Payara 6.2024.6] [INFO] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=2878 _ThreadName=orb-thread-pool-1 (pool #1): worker-5] [timeMillis: 1738950604081] [levelValue: 800] [[
  Tab-delimited file produced: /tmp/data-8007771720148526842.tab]]

[2025-02-07T12:50:04.184-0500] [Payara 6.2024.6] [WARNING] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=2878 _ThreadName=orb-thread-pool-1 (pool #1): worker-5] [timeMillis: 1738950604184] [levelValue: 900] [[
  Ingest failure (Exception class java.lang.IllegalArgumentException): The header contains a duplicate name: "Right Ascension" in [Date, Right Ascension, Right Ascension, Declination, Declination, Magnitude].]]

[2025-02-07T12:50:04.186-0500] [Payara 6.2024.6] [WARNING] [] [edu.harvard.iq.dataverse.ingest.IngestMessageBean] [tid: _ThreadID=2878 _ThreadName=orb-thread-pool-1 (pool #1): worker-5] [timeMillis: 1738950604186] [levelValue: 900] [[
  Error occurred during ingest job for file id 2527483!]]
@qqmyers
Copy link
Member

qqmyers commented Feb 14, 2025

FWIW: Without checking all the details, I think it ~makes sense if the files are being added separately. After the upload of an ingestable file, there should be an ingest lock. If that's ignored, it's possible that the ingest could finish and update the dataset, causing other updates (like new file adds) to fail since they are working on an outdated copy of the dataset. My guess is that, as with #10793, we aren't handling it well. Probably should either be checking locks earlier and/or catching the 500 to give a better error. With that, you'd still have to retry uploads. A bigger refactor, along the lines of #10856 might make it possible for ingest to not update the overall version and therefore not cause an optimistic lock exception.

A work-around would be to check for locks and wait until they are released before adding more files, or using the multiple file add (available for direct upload via pyDvuploader, not sure if pyDataverse can do it.).

@landreev
Copy link
Contributor

landreev commented Feb 14, 2025

I thought python-dvuploader v0.2.3 was in fact using /addfiles (batch add of all the files at once at the end of the direct upload workflow). Or am I thinking of pyDvuploader? - I may be a bit confused about the distinction between the two.
The explanation of why it would cause an optimistic lock with the update-after-each-upload method makes sense. Although it does not immediately explain why the same uploads are working on demo; but I can imagine how differences in timing on the prod. servers could account for this. [edit: sorry, I apparently misread the opening description; if it's bombing reproducibly in both places, that's less weirdness to worry about]

@kmika11 I would absolutely recommend to try to use an implementation of direct upload that DOES take advantage of the batch update, for all sorts of other reasons outside of this specific problem.

@landreev
Copy link
Contributor

I just edited my comment above - I apparently misread the opening description; if it's bombing reproducibly in both places, demo and prod., that's less weirdness to worry about. So, 🎉 .

@JR-1991
Copy link
Contributor

JR-1991 commented Feb 14, 2025

@landreev I am utilizing the batch upload (see here and here) in the pyDVUploader. Am I using it wrong or are there some things to consider? Been following the docs as close as possible.

@landreev
Copy link
Contributor

landreev commented Feb 16, 2025

@JR-1991 thanks. Please note that I was replying to @qqmyers' comment, where he suggested that finalizing one file at a time may have been the issue. Your implementation of direct upload appears to be doing the right thing, just as Jim and I assumed. I.e., using /api/datasets/:persistentId/addFiles to finalize all the directly-uploaded files, all at once, at the end of the workflow. (early implementations of direct upload were using /api/datasets/:persistentId/add instead, finalizing each file separately, after each file is directly-uploaded).

A work-around would be to ... or using the multiple file add (available for direct upload via pyDvuploader

@qqmyers was your comment written under the assumption that direct upload was NOT enabled for the dataset? - because it IS enabled for the dataset above (or did I miss something else)

... Oh, wait - are we running into the optimistic lock despite the fact that multi-file add is used; simply because the file in question is so tiny, it ingests before the rest of /addFiles finishes doing its thing? - it is 140 bytes, if I'm reading it correctly.

@qqmyers
Copy link
Member

qqmyers commented Feb 16, 2025

From the stack trace, it looks like the single file version is getting called, but it is direct upload. The json is getting printed per file, I think by

msgt("(api) jsonData: " + jsonData);
and that json does include the storageIdentifier (hence direct).

@landreev
Copy link
Contributor

Hmmm. I'm not seeing the single-file /add anywhere in directupload.py. So, we need to look at Katie's codebook and see if the code there is in fact using Jan's python-dvuploader, and/or if it ends up using it, but in a way that somehow reverts to still calling /add after each upload.
This is all strange, but interesting.

@landreev
Copy link
Contributor

... that said, I'm not seeing the words (api) jsonData: in the posted stacktrace (?).

@landreev
Copy link
Contributor

That json print in the stacktrace may be from here:

and OptionalFileParams is created from json for each file from inside

public Response addFiles(String jsonData, Dataset dataset, User authUser, boolean trustSuppliedFileSizes) {

I think

@qqmyers
Copy link
Member

qqmyers commented Feb 16, 2025

Ah - you're right. The logging in Datasets is set to fine at

.

That said, the msgt in OptionalFileParams (which is just System.out.println) would get called for the single or multiple file api calls, and ingest is only started after the UpdateDatasetVersionCommand in the addFiles code - at

ingestService.startIngestJobsForDataset(dataset, (AuthenticatedUser) authUser);
, so hard to see how it could interfere.

@landreev
Copy link
Contributor

landreev commented Feb 16, 2025

I'm not really working today; would need to take a closer look at the sequence of events there. But I think this line in the stacktrace suggests that it was an /addFiles (or /replaceFiles rather), multi-file call:

Ingest of 3 tabular data file(s) is in progress.

What is the database transaction here - is it the execution of the UpdateDatasetVersionCommand? Or is the entire API call executed in one transaction? If it is the latter, I can see how an ingest of 3 really tiny files can beat the command, and result in an optimistic lock, maybe?

@qqmyers
Copy link
Member

qqmyers commented Feb 16, 2025

Makes sense. The command only supports a transaction so it wouldn't create a new one if one was already started.

@landreev
Copy link
Contributor

@kmika11 The preliminary investigation appears to suggest that this is NOT the result of anything in your python code, or anything wrong with the python implementation of direct upload that you are using. What seems to be causing this is just the fact that the 3 ingestable files in the batch are so microscopically small; because of that they ingest too fast for their own good, in a way that interferes with the main update (the "registering files" step).

This is objectively a bug in Dataverse; and it should be possible to prevent this from happening, on our end. In the more immediate terms, there should be some quick workarounds that would allow you to upload these files. Operating under the assumption that you probably don't need these files to be ingested as tabular data (somewhat unlikely that any useful tabular metadata can be extracted from a 34 byte-sized csv file - ?): The simplest way to avoid having these files ingested is to rename them .txt instead of .csv. (and/or change the mime type to text/plain if you are sending text/csv for them explicitly).

I may have some questions about how the notebook is handling the uploads (the use of replace specifically); I may ask separately.

@JR-1991 It is not explicitly mentioned in the README.md, but can I pass the tabIngest: false key in the file config, to programmatically prevent a Stata, csv etc. file from being ingested as tabular?

@JR-1991
Copy link
Contributor

JR-1991 commented Feb 18, 2025

@landreev, the tabIngest field was missing from the related object. I've added it and submitted a PR gdcc/python-dvuploader#24 to fix this issue. Additionally, I tested it with several CSV files and managed to replicate the error on https://demo.dataverse.org with the direct upload. When I disable tabIngest, everything functions correctly.

Here’s the error message I get when ingestion is enabled:

CommandException updating DatasetVersion from addFiles job: Command
edu.harvard.iq.dataverse.engine.command.impl.UpdateDatasetVersionCommand@2d324eb6 failed: 
Exception [EclipseLink-5010] (Eclipse Persistence Services - 4.0.1.payara-p2.v202310250827): 
org.eclipse.persistence.exceptions.OptimisticLockException\nException Description: The object 
[edu.harvard.iq.dataverse.FileMetadata[id=2287493]] cannot be merged because it has changed or been deleted since it was last 
read. \nClass> edu.harvard.iq.dataverse.FileMetadata"}```

@kmika11
Copy link
Contributor

kmika11 commented Feb 18, 2025

I may have some questions about how the notebook is handling the uploads (the use of replace specifically); I may ask separately.

@landreev I can't speak to the use of replace as its identified in the error message here. I'm using easyDataverse and its implementation of dvuploader & direct upload for uploading files. My code doesn't build the url to use /replaceFiles - I just use dataset.add_file() in a function in cell 3 and then dataset.upload() in cell 17, which results in the error. I updated https://github.com/kmika11/harvard-college-observatory-announcements/blob/main/curation_script_HCO.ipynb to make this clearer. I also had a work around in there to upload empty datasets and then directly use the python dvuploader to add files to datasets, but received the same error.

I do not explicitly identify the mime type because there isn't a mimetype argument in dataset.add_file(). Not ingesting the tiny csvs would be fine, but I they can't be reformatted as as .txt files. We would like to have them identified correctly in the metadata.

A final note - the files appear to upload fine, and do get correctly assigned to the right dataset. And the dataset does not appear locked when I review it in a browser. I just get a 500 error that interrupts multiple dataset deposits that are looped.

@JR-1991
Copy link
Contributor

JR-1991 commented Feb 18, 2025

@kmika11 easyDataverse utilizes python-dvuploader on the backend and indeed the add_file method does not expose the details. Good point to make this one more explicit.

After the PR is approved, I will launch the new version and update easyDataverse accordingly 🙌

@landreev
Copy link
Contributor

@kmika11

Not ingesting the tiny csvs would be fine, but I they can't be reformatted as as .txt files.

Since Jan is adding support for the tabIngest: false flag to python-dvuploader, that sounds like the cleanest, non-hacky solution.

@landreev
Copy link
Contributor

@kmika11 @JR-1991
I'm not super invested in this... but it's bothering me a little that I still don't understand what the uploader is actually doing. The whole process is definitely bombing inside the /replaceFiles API call. And it is still entirely not clear why this call is made in the first place.
What I'm seeing in the logs on demo is that 6 files are successfully added to the dataset, with 6 /uploadurls calls, then a successful /addFiles call; this is how the dataset ends up with the 6 files in it (the createdate stamps on the files match the time of the /addFiles call). But then immediately after there is a /replaceFiles call from the client, and that fails with a 500:

"98.97.23.92" "07/Feb/2025:12:49:55 -0500" "GET /api/datasets/:persistentId/?persistentId=doi:10.70122/FK2/0HMSY2 HTTP/1.1" 200 5055 "python-requests/2.31.0" "NULL-REFERER" "NULL-COOKIE-JSESSIONID" 
"98.97.23.92" "07/Feb/2025:12:49:57 -0500" "GET /api/datasets/:persistentId/uploadurls?key=5187fd33-cef0-42b9-a3ea-95ac104afda9&persistentId=doi:10.70122/FK2/0HMSY2&size=1024 HTTP/1.1" 200 476 "python-requests/2.31.0" "NULL-REFERER" "NULL-COOKIE-JSESSIONID" 
"98.97.23.92" "07/Feb/2025:12:49:57 -0500" "GET /api/datasets/:persistentId/uploadurls?key=5187fd33-cef0-42b9-a3ea-95ac104afda9&persistentId=doi:10.70122/FK2/0HMSY2&size=1735 HTTP/1.1" 200 476 "Python/3.12 aiohttp/3.9.5" "NULL-REFERER" "NULL-COOKIE-JSESSIONID" 
"98.97.23.92" "07/Feb/2025:12:49:57 -0500" "GET /api/datasets/:persistentId/uploadurls?key=5187fd33-cef0-42b9-a3ea-95ac104afda9&persistentId=doi:10.70122/FK2/0HMSY2&size=140 HTTP/1.1" 200 476 "Python/3.12 aiohttp/3.9.5" "NULL-REFERER" "NULL-COOKIE-JSESSIONID" 
"98.97.23.92" "07/Feb/2025:12:49:57 -0500" "GET /api/datasets/:persistentId/uploadurls?key=5187fd33-cef0-42b9-a3ea-95ac104afda9&persistentId=doi:10.70122/FK2/0HMSY2&size=34 HTTP/1.1" 200 476 "Python/3.12 aiohttp/3.9.5" "NULL-REFERER" "NULL-COOKIE-JSESSIONID" 
"98.97.23.92" "07/Feb/2025:12:49:57 -0500" "GET /api/datasets/:persistentId/uploadurls?key=5187fd33-cef0-42b9-a3ea-95ac104afda9&persistentId=doi:10.70122/FK2/0HMSY2&size=30 HTTP/1.1" 200 476 "Python/3.12 aiohttp/3.9.5" "NULL-REFERER" "NULL-COOKIE-JSESSIONID" 
"98.97.23.92" "07/Feb/2025:12:49:57 -0500" "GET /api/datasets/:persistentId/uploadurls?key=5187fd33-cef0-42b9-a3ea-95ac104afda9&persistentId=doi:10.70122/FK2/0HMSY2&size=48933 HTTP/1.1" 200 476 "Python/3.12 aiohttp/3.9.5" "NULL-REFERER" "NULL-COOKIE-JSESSIONID" 
"98.97.23.92" "07/Feb/2025:12:49:57 -0500" "GET /api/datasets/:persistentId/uploadurls?key=5187fd33-cef0-42b9-a3ea-95ac104afda9&persistentId=doi:10.70122/FK2/0HMSY2&size=211376 HTTP/1.1" 200 476 "Python/3.12 aiohttp/3.9.5" "NULL-REFERER" "NULL-COOKIE-JSESSIONID" 
"98.97.23.92" "07/Feb/2025:12:50:03 -0500" "POST /api/datasets/:persistentId/addFiles?persistentId=doi:10.70122/FK2/0HMSY2 HTTP/1.1" 200 5289 "Python/3.12 aiohttp/3.9.5" "NULL-REFERER" "NULL-COOKIE-JSESSIONID" 
"98.97.23.92" "07/Feb/2025:12:50:03 -0500" "POST /api/datasets/:persistentId/replaceFiles?persistentId=doi:10.70122/FK2/0HMSY2 HTTP/1.1" 500 533 "Python/3.12 aiohttp/3.9.5" "NULL-REFERER" "NULL-COOKIE-JSESSIONID"

Why is this /replaceFiles call there? @JR-1991 is there a chance that there is some kind of a bug in python-dvuploader that makes this extra call unnecessarily?
I am also seeing this extra /replaceFiles in the access log for the dataset right above, for which the script appears to have worked:

"98.97.23.92" "07/Feb/2025:12:49:54 -0500" "POST /api/datasets/:persistentId/addFiles?persistentId=doi:10.70122/FK2/OEYLUQ HTTP/1.1" 200 4409 "Python/3.12 aiohttp/3.9.5" "NULL-REFERER" "NULL-COOKIE-JSESSIONID" 
"98.97.23.92" "07/Feb/2025:12:49:54 -0500" "POST /api/datasets/:persistentId/replaceFiles?persistentId=doi:10.70122/FK2/OEYLUQ HTTP/1.1" 200 114 "Python/3.12 aiohttp/3.9.5" "NULL-REFERER" "NULL-COOKIE-JSESSIONID"

@JR-1991
Copy link
Contributor

JR-1991 commented Feb 19, 2025

@landreev there is a mechanism to check whether a file already exists and differs in hash. If so, the uploader flags it as to_replace and in the replacement step calls /replaceFiles. This is the part handling the distribution and this one prepares the JSON payload.

I noticed that the request is sent regardless of the payload content, including empty cases, which is definitely a bug. I have opened an issue regarding this and will add this to the tabIngest PR:

Edit: Added the parts of code to the PR and tested both with replacement and novel files. Now works as expected.

@landreev
Copy link
Contributor

@JR-1991 Thank you so much for addressing these issues so quickly!
This /fileReplace is sent with an empty list of files in this specific case, correct? - That sounds like a bug on the Dataverse side as well; because Dataverse apparently proceeds to execute the UpdateDatasetVersionCommand, even when it didn't have to replace any files - and that's where the script is bombing. We'll fix this on our end as well; but it's just a longer and more arduous process, to get something merged and added to a Dataverse release (unfortunately).

@JR-1991
Copy link
Contributor

JR-1991 commented Feb 19, 2025

@landreev Thanks too for the hints! Helps a lot to debug the uploader 😊

This /fileReplace is sent with an empty list of files in this specific case, correct?

Yes, according to the code, an empty list should be sent when the files are entirely new in the dataset.

We'll fix this on our end as well; but it's just a longer and more arduous process, to get something merged and added to a Dataverse release (unfortunately).

Perfect! The PR includes a condition not to send anything if the list is empty. Thus, we can work around this in the meantime.

@qqmyers
Copy link
Member

qqmyers commented Feb 19, 2025

FWIW: If I'm following correctly - sounds like the empty replace can be fixed on both sides, but if the replace call is ever not empty, you may need to wait or would risk an OptimisticLockException from indexing, ingest, etc. in the /addFiles call.

@JR-1991
Copy link
Contributor

JR-1991 commented Feb 19, 2025

@qqmyers good point, I will add a lock-check to be safe.

@landreev
Copy link
Contributor

Correct. In this instance it was caused by an ingest that was too fast conflicting with an empty /replace. But it should be entirely possible to run into this with a longer-running ingest or indexing and a similarly expensive /replace, they just need to overlap in a certain way.
So, hooray for locks! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants