Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add version handling #103

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 17 additions & 4 deletions datalad_dataverse/remote.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,11 @@

from datalad_dataverse.utils import (
get_native_api,
format_doi,
)
import os
import re

from datalad_dataverse.utils import format_doi


from datalad.customremotes import SpecialRemote

Expand All @@ -31,10 +30,12 @@ def __init__(self, *args):
self.configs['dlacredential'] = \
'Identifier used to retrieve an API token from a local ' \
'credential store'
self.configs['version'] = 'Dataverse dataset version'
self._doi = None
self._url = None
self._api = None
self._token = None
self._dataset_version = None

def initremote(self):
"""
Expand Down Expand Up @@ -140,6 +141,14 @@ def api(self):

return self._api

@property
def dataset_version(self):
if self._dataset_version is None:
self._dataset_version = self.annex.getconfig('version')
if len(self._dataset_version) == 0:
self._dataset_version = ":latest"
return self._dataset_version

def prepare(self):
# trigger API instance in order to get possibly auth/connection errors
# right away
Expand All @@ -158,6 +167,8 @@ def checkpresentexport(self, key, remote_file):
return self.checkpresent(key=remote_file)

def transfer_store(self, key, local_file, datafile=None):
if not self.dataset_version == ':latest':
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and line 205 are not working.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it seems to work. Only datalad push is ignoring Exceptions?

Copy link
Collaborator Author

@ksarink ksarink Jun 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When no exception is raised datalad push actually works and a DRAFT is created on dataverse.

The log is exactly the same when it fails. It still says it succeeded but nothing changed on dataverse:

publish(ok): . (dataset) [refs/heads/git-annex->origin:refs/heads/git-annex 62d6084..a5c49a4]                                                                                                 
publish(ok): . (dataset) [refs/heads/master->origin:refs/heads/master 424537b..37ed9ae]                                                                                                       
action summary:                                                                                                                                                                               
  publish (ok: 2)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to work when the files handled by annex are pushed
It does not work when it's a git remote.

Steps to reproduce:
git clone "datalad-annex::?type=external&externaltype=dataverse&encryption=none&url=http://localhost:8080/&doi=doi:10.5072/FK2/JSUZ6P" directory
cp filexy directory ``cd directory
datalad save && datalad push --to origin

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that is misleading, because you pushed that before, right? Hence, git-annex knows the file is there already and doesn't even try to TRANSFER-STORE. Needs to be fresh and with no pushed content yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: If you look and the results, it's only two: one for each branch pushed. No content pushed whatsoever, and no error reported from annex. Hence this push call didn't even try.

Copy link
Collaborator Author

@ksarink ksarink Jun 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there are no new files since only the datalad metadata gets uploaded. Which are just these two files which are changed when new files are saved into the datalad instance.
Screenshot from 2022-06-18 14-48-19

I am not sure if this really is not an issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that doesn't look right indeed. I'll have a look at it on Monday!

Thanks a ton so far!

raise RuntimeError('Cannot store files if a specific version is checked out!')
ds_pid = self.doi
if datafile is None:
datafile = Datafile()
Expand Down Expand Up @@ -185,12 +196,12 @@ def transfer_retrieve(self, key, file):
# this relies on having established the NativeApi in prepare()
api_token=self._token,
)
dataset = self.api.get_dataset(identifier=self.doi)
dataset = self.api.get_dataset_version(identifier=self.doi, version=self.dataset_version)

# http error handling
dataset.raise_for_status()

files_list = dataset.json()['data']['latestVersion']['files']
files_list = dataset.json()['data']['files']

# find the file we want to download
file_id = None
Expand All @@ -214,6 +225,8 @@ def transferexport_retrieve(self, key, local_file, remote_file):
self.transfer_retrieve(key=remote_file, file=local_file)

def remove(self, key):
if not self.dataset_version == ':latest':
raise RuntimeError('Cannot remove file if a specific version is checked out!')
# get the dataset and a list of all files
dataset = self.api.get_dataset(identifier=self.doi)
# http error handling
Expand Down