Skip to content

Commit 3b5bc03

Browse files
authored
[SYNPY-1575] Introduce EntityView model (#1181)
* Introduce EntityView model
1 parent e17fbc8 commit 3b5bc03

38 files changed

+4970
-304
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# EntityView
2+
3+
Contained within this file are experimental interfaces for working with the Synapse Python
4+
Client. Unless otherwise noted these interfaces are subject to change at any time. Use
5+
at your own risk.
6+
7+
## API reference
8+
9+
::: synapseclient.models.EntityView
10+
options:
11+
inherited_members: true
12+
members:
13+
- store_async
14+
- get_async
15+
- delete_async
16+
- update_rows_async
17+
- query_async
18+
- query_part_mask_async
19+
- snapshot_async
20+
- add_column
21+
- reorder_column
22+
- delete_column
23+
- get_acl_async
24+
- get_permissions_async
25+
- set_permissions_async
26+
---
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# EntityView
2+
3+
Contained within this file are experimental interfaces for working with the Synapse Python
4+
Client. Unless otherwise noted these interfaces are subject to change at any time. Use
5+
at your own risk.
6+
7+
## API reference
8+
9+
::: synapseclient.models.EntityView
10+
options:
11+
inherited_members: true
12+
members:
13+
- store
14+
- get
15+
- delete
16+
- update_rows
17+
- query
18+
- query_part_mask
19+
- snapshot
20+
- add_column
21+
- reorder_column
22+
- delete_column
23+
- get_acl
24+
- get_permissions
25+
- set_permissions
26+
---

docs/tutorials/python/dataset.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Datasets
2-
Datasets in Synapse are a way to organize, annotate, and publish sets of files for others to use. Datasets behave similarly to Tables and FileViews, but provide some default behavior that makes it easy to put a group of files together.
2+
Datasets in Synapse are a way to organize, annotate, and publish sets of files for others to use. Datasets behave similarly to Tables and EntityViews, but provide some default behavior that makes it easy to put a group of files together.
33

44
This tutorial will walk through basics of working with datasets using the Synapse Python client.
55

docs/tutorials/python/entityview.md

+140
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
# EntityViews
2+
EntityViews in Synapse allow you to create a queryable view that provides a unified selection
3+
of entities stored in different locations within your Synapse project. This can be
4+
particularly useful for managing and querying metadata across multiple files, folders,
5+
or projects that you manage.
6+
7+
Views display rows and columns of information, and they can be shared and queried with
8+
SQL. Views are queries of other data already in Synapse. They allow you to see groups
9+
of entities including files, tables, folders, or datasets and any associated
10+
annotations about those items.
11+
12+
Annotations are an essential component to building a view. Annotations are labels that
13+
you apply to your data, stored as key-value pairs in Synapse. They help users search
14+
for and find data, and they are a powerful tool used to systematically group and
15+
describe things in Synapse.
16+
17+
This tutorial will follow a [Flattened Data Layout](../../explanations/structuring_your_project.md#flattened-data-layout-example). With a project that has this example layout:
18+
```
19+
.
20+
21+
└── single_cell_RNAseq_batch_1
22+
├── SRR12345678_R1.fastq.gz
23+
└── SRR12345678_R2.fastq.gz
24+
```
25+
26+
## Tutorial Purpose
27+
In this tutorial you will:
28+
29+
1. Create a EntityView with a number of columns
30+
2. Query the EntityView
31+
3. Update rows in the EntityView
32+
4. Update the scope of your EntityView
33+
5. Update the types of entities in your EntityView
34+
35+
## Prerequisites
36+
* This tutorial assumes that you have a project in Synapse with one or more
37+
files/folders. It does not need to match the given structure in this tutorial, but, if
38+
you do not have this already set up you may reference the [Folder](./folder.md)
39+
and [File](./file.md) tutorials.
40+
* Pandas must be installed as shown in the [installation documentation](../installation.md)
41+
42+
43+
## 1. Find the synapse ID of your project
44+
45+
First let's set up some constants we'll use in this script, and find the ID of our project
46+
```python
47+
{!docs/tutorials/python/tutorial_scripts/entityview.py!lines=5-22}
48+
```
49+
50+
## 2. Create a EntityView with Columns
51+
52+
Now, we will create 4 columns to add to our EntityView. Recall that any data added to
53+
these columns will be stored as an annotation on the underlying File.
54+
55+
```python
56+
{!docs/tutorials/python/tutorial_scripts/entityview.py!lines=24-31}
57+
```
58+
59+
Next we're going to store what we have to Synapse and print out the results
60+
61+
```python
62+
{!docs/tutorials/python/tutorial_scripts/entityview.py!lines=33-47}
63+
```
64+
65+
## 3. Query the EntityView
66+
67+
```python
68+
{!docs/tutorials/python/tutorial_scripts/entityview.py!lines=49-54}
69+
```
70+
71+
<details class="example">
72+
<summary>The result of querying your File View should look like:</summary>
73+
```
74+
id name species dataType...
75+
0 syn1 SRR12345678_R1.fastq.gz Homo sapiens geneExpression
76+
1 syn2 SRR12345678_R1.fastq.gz Homo sapiens geneExpression
77+
```
78+
</details>
79+
80+
## 4. Update rows in the EntityView
81+
82+
Now that we know the data is present in the EntityView, let's go ahead and update the
83+
annotations on these Files. The following code sets all returned rows to a single
84+
value. Since the results were returned as a Pandas DataFrame you have many
85+
options to search through and set values on your data.
86+
87+
```python
88+
{!docs/tutorials/python/tutorial_scripts/entityview.py!lines=56-66}
89+
```
90+
91+
A note on `wait_for_eventually_consistent_view`: EntityViews in Synapse are eventually
92+
consistent, meaning that updates to data may take some time to be reflected in the
93+
view. The `wait_for_eventually_consistent_view` flag allows the code to pause until
94+
the changes are fully propagated to your EntityView. When this flag is set to `True` a
95+
query is automatically executed on the view to determine if the view contains the
96+
updated changes. It will allow your next query on your view to reflect any changes that
97+
you made. Conversely, if this is set to `False`, there is no guarantee that your next
98+
query will reflect your most recent changes.
99+
100+
## 5. Update the scope of your EntityView
101+
102+
As your project expands or contracts you will need to adjust the containers you'd like
103+
to include in your view. In order to accomplish this you may modify the `scope_ids`
104+
attribute on your view.
105+
106+
```python
107+
{!docs/tutorials/python/tutorial_scripts/entityview.py!lines=69-73}
108+
```
109+
110+
## 6. Update the types of Entities included in your EntityView
111+
112+
You may also want to change what types of Entities may be included in your view. To
113+
accomplish this you'll be modifying the `view_type_mask` attribute on your view.
114+
115+
```python
116+
{!docs/tutorials/python/tutorial_scripts/entityview.py!lines=75-79}
117+
```
118+
119+
## Results
120+
Now that you have created and updated your File View, you can inspect it in the
121+
Synapse web UI. It should look similar to:
122+
123+
![entityview](./tutorial_screenshots/entityview.png)
124+
125+
## Source code for this tutorial
126+
127+
<details class="quote">
128+
<summary>Click to show me</summary>
129+
130+
```python
131+
{!docs/tutorials/python/tutorial_scripts/entityview.py!}
132+
```
133+
</details>
134+
135+
## References used in this tutorial
136+
137+
- [EntityView](../../reference/experimental/sync/entityview.md)
138+
- [Column][synapseclient.models.Column]
139+
- [syn.login][synapseclient.Synapse.login]
140+
- [Project](../../reference/experimental/sync/project.md)

docs/tutorials/python/file_view.md

-4
This file was deleted.
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
"""
2+
Here is where you'll find the code for the EntityView tutorial.
3+
"""
4+
5+
import pandas as pd
6+
7+
from synapseclient import Synapse
8+
from synapseclient.models import (
9+
Column,
10+
ColumnType,
11+
EntityView,
12+
Project,
13+
ViewTypeMask,
14+
query,
15+
)
16+
17+
syn = Synapse()
18+
syn.login()
19+
20+
# First let's get the project we want to create the EntityView in
21+
my_project = Project(name="My uniquely named project about Alzheimer's Disease").get()
22+
project_id = my_project.id
23+
24+
# Next let's add some columns to the EntityView, the data in these columns will end up
25+
# being stored as annotations on the files
26+
columns = [
27+
Column(name="species", column_type=ColumnType.STRING),
28+
Column(name="dataType", column_type=ColumnType.STRING),
29+
Column(name="assay", column_type=ColumnType.STRING),
30+
Column(name="fileFormat", column_type=ColumnType.STRING),
31+
]
32+
33+
# Then we will create a EntityView that is scoped to the project, and will contain a row
34+
# for each file in the project
35+
view = EntityView(
36+
name="My Entity View",
37+
parent_id=project_id,
38+
scope_ids=[project_id],
39+
view_type_mask=ViewTypeMask.FILE,
40+
columns=columns,
41+
).store()
42+
43+
print(f"My EntityView ID is: {view.id}")
44+
45+
# When the columns are printed you'll notice that it contains a number of columns that
46+
# are automatically added by Synapse in addition to the ones we added
47+
print(view.columns.keys())
48+
49+
# Query the EntityView
50+
results_as_dataframe: pd.DataFrame = query(
51+
query=f"SELECT id, name, species, dataType, assay, fileFormat, path FROM {view.id} WHERE path like '%single_cell_RNAseq_batch_1%'",
52+
include_row_id_and_row_version=False,
53+
)
54+
print(results_as_dataframe)
55+
56+
# Finally let's update the annotations on the files in the project
57+
results_as_dataframe["species"] = ["Homo sapiens"] * len(results_as_dataframe)
58+
results_as_dataframe["dataType"] = ["geneExpression"] * len(results_as_dataframe)
59+
results_as_dataframe["assay"] = ["SCRNA-seq"] * len(results_as_dataframe)
60+
results_as_dataframe["fileFormat"] = ["fastq"] * len(results_as_dataframe)
61+
62+
view.update_rows(
63+
values=results_as_dataframe,
64+
primary_keys=["id"],
65+
wait_for_eventually_consistent_view=True,
66+
)
67+
68+
69+
# Over time you may have a need to add or remove scopes from the EntityView, you may
70+
# use `add` or `remove` along with the Synapse ID of the scope you wish to add/remove
71+
view.scope_ids.add("syn1234")
72+
# view.scope_ids.remove("syn1234")
73+
view.store()
74+
75+
# You may also need to add or remove the types of Entities that may show up in your view
76+
# You will be able to specify multiple types using the bitwise OR operator, or a single value
77+
view.view_type_mask = ViewTypeMask.FILE | ViewTypeMask.FOLDER
78+
# view.view_type_mask = ViewTypeMask.FILE
79+
view.store()

docs/tutorials/python_client.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ By the end of these tutorials you'll have:
2020
- [Annotations](./python/annotation.md) added to your Project, Folders, and Files
2121
- A File with multiple [Versions](./python/versions.md)
2222
- A File that has an [Activity/Provenance](./python/activity.md) added to it
23-
- A [File View](./python/file_view.md) created for your Project
23+
- A [Entity View/File View](./python/entityview.md) created for your Project
2424
- A [Table](./python/table.md) created for your Project
2525
- [Create, Read, Update, Delete operations](./python/table_crud.md) for your table
2626
- A [Dataset](./python/dataset.md) created for your project

mkdocs.yml

+3-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ nav:
3030
- Annotation: tutorials/python/annotation.md
3131
# - Versions: tutorials/python/versions.md
3232
# - Activity/Provenance: tutorials/python/activity.md
33-
# - File View: tutorials/python/file_view.md
33+
- Entity View: tutorials/python/entityview.md
3434
# - Table: tutorials/python/table.md
3535
# - Using a Table: tutorials/python/table_crud.md
3636
- Dataset: tutorials/python/dataset.md
@@ -81,6 +81,7 @@ nav:
8181
- File: reference/experimental/sync/file.md
8282
- Table: reference/experimental/sync/table.md
8383
- Dataset: reference/experimental/sync/dataset.md
84+
- EntityView: reference/experimental/sync/entityview.md
8485
- Activity: reference/experimental/sync/activity.md
8586
- Team: reference/experimental/sync/team.md
8687
- UserProfile: reference/experimental/sync/user_profile.md
@@ -92,6 +93,7 @@ nav:
9293
- File: reference/experimental/async/file.md
9394
- Table: reference/experimental/async/table.md
9495
- Dataset: reference/experimental/async/dataset.md
96+
- EntityView: reference/experimental/async/entityview.md
9597
- Activity: reference/experimental/async/activity.md
9698
- Team: reference/experimental/async/team.md
9799
- UserProfile: reference/experimental/async/user_profile.md

synapseclient/api/entity_factory.py

+14-2
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@
1919
)
2020

2121
if TYPE_CHECKING:
22+
from models.entityview import EntityView
23+
2224
from synapseclient import Synapse
2325
from synapseclient.models import Dataset, File, Folder, Project, Table
2426

@@ -32,7 +34,9 @@ async def get_from_entity_factory(
3234
download_file: bool = True,
3335
download_location: str = None,
3436
follow_link: bool = False,
35-
entity_to_update: Union["Project", "File", "Folder", "Table", "Dataset"] = None,
37+
entity_to_update: Union[
38+
"Project", "File", "Folder", "Table", "Dataset", "EntityView"
39+
] = None,
3640
*,
3741
synapse_client: Optional["Synapse"] = None,
3842
) -> Union["Project", "File", "Folder"]:
@@ -238,7 +242,7 @@ async def _cast_into_class_type(
238242
entity_to_update: Union["Project", "File", "Folder"] = None,
239243
*,
240244
synapse_client: Optional["Synapse"] = None,
241-
) -> Union["Project", "File", "Folder"]:
245+
) -> Union["Project", "File", "Folder", "Table", "Dataset", "EntityView"]:
242246
"""
243247
Take an entity_bundle returned from the Synapse API and cast it into the appropriate
244248
class type. This will also download the file if `download_file` is set to True.
@@ -363,6 +367,14 @@ class type. This will also download the file if `download_file` is set to True.
363367
entity = entity_to_update.fill_from_dict(
364368
entity=entity_bundle["entity"], set_annotations=False
365369
)
370+
elif entity["concreteType"] == concrete_types.ENTITY_VIEW:
371+
if not entity_to_update:
372+
from models.entityview import EntityView
373+
374+
entity_to_update = EntityView()
375+
entity = entity_to_update.fill_from_dict(
376+
entity=entity_bundle["entity"], set_annotations=False
377+
)
366378
else:
367379
raise ValueError(
368380
f"Attempting to retrieve an unsupported entity type of {entity['concreteType']}."

synapseclient/client.py

+1
Original file line numberDiff line numberDiff line change
@@ -6428,6 +6428,7 @@ async def _rest_call_async(
64286428
Returns:
64296429
JSON encoding of response
64306430
"""
6431+
self.logger.debug(f"Sending {method} request to {uri}")
64316432
uri, headers = self._build_uri_and_headers(
64326433
uri, endpoint=endpoint, headers=headers, is_httpx=True
64336434
)

synapseclient/core/constants/concrete_types.py

+1
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@
6767
PROJECT_ENTITY = "org.sagebionetworks.repo.model.Project"
6868
TABLE_ENTITY = "org.sagebionetworks.repo.model.table.TableEntity"
6969
DATASET_ENTITY = "org.sagebionetworks.repo.model.table.Dataset"
70+
ENTITY_VIEW = "org.sagebionetworks.repo.model.table.EntityView"
7071

7172
# upload requests
7273
MULTIPART_UPLOAD_REQUEST = "org.sagebionetworks.repo.model.file.MultipartUploadRequest"

0 commit comments

Comments
 (0)