-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add data pipeline for hydrology data #703
Comments
Just talked to @dblodgett-usgs We should consider covJSON w/ waterml2 use case elements |
Thanks @ksonda. CoverageJSON is default output from pygeoapi EDR support, so we would get it for free once there is an EDR plugin for a relevant backend. |
agree, sounds like the least cost path forward to me... |
A couple thoughts about this suggestion. There are two use cases here -- 1) the "Web data" use case which requires a convention to encode key elements of data for plots and some site metadata and 2) the "data exchange" use case which requires a convention to encode more precise data contents that are unique to the hydrometric station timeseries use cases supported by WaterML2 part 1. IMHO, it would be best to just use CoverageJSON for the timeseries payload and a GeoJSON-compatible json-schema for the site metadata. If there are critical metadata nuances that can not be captured in a satisfactory way in CoverageJSON, then perhaps we jump into a full json encoding of timeseriesML/WaterML2 Part 1. I'd be happy to contribute to this effort as it unfolds and really appreciate your efforts on this!! |
Curious if there's room in the EDR spec to cover both use cases, given that /locations is just supposed to be an geojson endpoint of some kind with the schema defined in the open api doc |
Probably yes -- for the more complex use case, the WaterML2 Part 1 metadata for time value pair metadata and the ability to alter default per time step metadata is the part that is going to be complicated and EDR has no issue with additional media types from the .../locations end point. Same for .../items, additional media types for features are well supported. |
hmmthat is tricky. Could specify for each Alternative1: best practice specify for each Alternative2: best practice specify for each |
I need to go back and read the spec and think about it some. As an initial take, just doing the happy path CoverageJSON with as much of the WaterML2 spec as "just works" would be a really great step!! |
Straightforward:
Seems hacky but does in fact have relevant guidance in the spec:
Unclear:
EDR can maybe handle via /locations, but custom handling by the service is one thing and cross-protocol data exchange is another :( To force in covJSON options
|
That's a quite different approach to what was discussed last week during the HDWG meeting between the colleagues from the WQ IE @sgrellet, @KathiSchleidt, @hylkevds and Rob Atkinson to move towards an update of TSML with a hydro profile/extension and JSON encoding. I agree with @dblodgett-usgs that we need a fair bit of metadata for the data exchange use case to make it work with WHOS, for station & measurement metadata WIS relies on WIGOS OSCAR/Surface metadata but this is fairly complex XML and hardly implemented by the hydro community so far. I think this needs a more in-depth discussion within the HDWG and maybe beyond as this is also relevant for other domains |
I think more discussion is good, but as part of that discussion I think it is worth seeing what is reusable or adaptable from straight geojson and covjson rather than assuming a priori we must have an entirely custom new json format from first principles. That may end up being the case, it may not. Above was just getting a start on how covJSON could fit in on the record. |
I regret that I was not able to take part in the discussion at the HDWG meeting -- family vacation took precedence. I fully expect that there is a need to do both. The Web use case could be (kind of has to be) satisfied by geojson and covjson because a boutique format won't be broadly supported / adoptable for Webby use cases. There may be a world where a JSON encoding of TSML with a WaterML2 Part 1 profile or best practice would be a critical format for data exchange but it would need to be in addition to more accessible Web formats. There also may be a world where we could establish a convention that would "just work" as geojson and covjson but I have a hard time seeing the compromises necessary for such a convention being acceptable to either Web or data exchange use cases. This is why I make the assertion up front that we probably need to do both at some level. So, let's run with use of existing accessible formats and focus on Web use cases with as much data exchange content as fits easily? |
Something we've been batting about as an experiment to reveal the opportunities and limitations of the existing constellation of standards for the "webby" use case.
Why?
|
Related to wmo-im/tt-w4h#28 |
@webb-ben and I met/discussed this today. Proposed way forward:
|
If this all pans out, it will be a very positive step. Thanks for taking it on guys! |
covJSON has the important concepts in Waterml2 other than TVP metadata which is of arguable importance to most people, and can be covered by additional parameters if necessary. SensorThingsAPI has all the concepts in waterml2 including TVP metadata so it would be very simple to write a simplified rewrap of STA JSON for a “complete” profile |
Agreed, we partly prototyped this during the OGC WaterQuality IE. We just shot an email to both OGC hydrodwg and tsml swg about how we could push this aspect forward. Feel free to contribute/raise interest |
One idea for TVP metadata in a covJSON context is a best practice that says, 1 coverage = 1 station with timeseries for 1 parameter. Any additional parameters shall be TVP metadata fields for that timeseries (e.g. data status or quality codes) |
Sorry for chiming in late here, but REALLY needed some vacation! Continuing on @dblodgett-usgs UC differentiation into
We've realized for quite a while under OMS/STA that most real-world UC split into these 2 views, first you look at the details, see if the data is fit for purpose. Once you've done that, you rarely look at this detailed view again, just want the simplified Geometry and a number view. We've been chewing on such simplified result formats for STA, have done a CSV result format for such purposes, could see using CovJSON here (or proxying the STA data via EDR with CovJSON output). Trick will be providing backlinks to the full "Data exchange" view for the case that folks want to go back to the details. As Sylvain has mentioned, in the WQ IE we've shown how well STA works for the "Data Exchange" UC, I'd be all for doing a WaterML profile for STA, defining what attributes should be in the properties blocks of the various STA classes. On the webby view, while I still need to take a closer look at CovJSON (my brain still things in the various CIS encodings), I'd like explore providing more than one time series for one station. To my memory, CovJSON does support non-spatiotemporal dimensions - couldn't one set up one dimension for stations, one for ObsProp, one for time? On TSML: had good discussions with Paul Hershberg during my vacation (if I'm in DC anyway, too good an opportunity to waste!). Plan is for me to co-chair TSML, build on the preliminary work Paul and I did this spring, try and get at least the conceptual model for TSML done by the end of the year (need to align to the updates in both OMS and OGC Coverage models) then can be integrated into the WaterML Update while we work on the encodings As a first step, I'd really appreciate samples of different water time-series encodings, see how reality aligns to the timeseries options available under TSML |
In practical terms, I think an easy-to-use software stack that provides WebUI <-"webby" covJSON <- EDR <- STA -> EDR -> Data "Exchange" TSML JSON would be a nice "house" to aim for. That being said, I do think some kind of simplified proxy layer like EDR is necessary over STA for something like TSML becuase OData is too flexible to provide predictable outputs IMHO. STA-based Clients would have to hardcode a specific query like https://labs.waterdata.usgs.gov/sta/v1.1/Things?$filter=name%20eq%20%27USGS-02085000%27%20or%20name%20eq%20%27USGS-0209734440%27&$expand=Locations,Datastreams($expand=ObservedProperty,Sensor,Observations($top=30)) to get the info roughly equivalent to WaterML2/TSML, or an @KathiSchleidt re covJSON for more than one obsprop per station. I think its worth exploring. Outstanding issues from my perspective:
|
@ksonda trying to understand the 2 ends of your chain:
What's the different between your 2 versions? To my view, they both look like simplified data exchange. I don't see how either version provides access to relevant observational metadata, e.g. ObservingProcedure, Observer... TSML foresees 2 types of timeseries encodings:
|
I think we agree that there can be two JSON encodings of WaterML2/TSML, one that is aligned with some kind of coverage data model for simple use cases and one that has ObsProc, platform/host/samplingfeature, observer/sensor, result-level metadata, etc. for analytical ones. I see the latter as being developed totally independent of STA rather than tied to STA. The issue is if we want a TSML JSON flavor to include all the more detailed metadata, for it to be a predictable, valid resultFormat in STA without missing values in nominally required fields., it would require some pretty specific query patterns in STA. EDR's query patterns are constrained enough but the outputs flexible enough that all you need is declare the schema of JSON response in the API document. e.g.
An implementer of an EDR that is wrapping an underlying STA endpoint would have no trouble delivering this document if there is a specification of the result format and its declared as an available format in the openAPI document. It's just configuring the STA query within the EDR implementation just like you'd have to write SQL queries directly if the EDR directly connected to SQL database. Just trying to delivera "full" WaterML2/TSML document via an STA encoding could lead to situations like,
How should this be handled? Will the STA server deliver the geometry of the Location of the Thing even if it wasn't asked for explicitly? Or the ObcProc information that is presumably supplied in the related Sensor entities? Will there be an error message? Will neither be supplied and its up to the user to always specify the exact combination of information they want? |
I'm sorry I think I introduced too much about particualr APIs like STA and EDR at all. I think we're in agreement that there can be a coverage/domainRange type "webby" format and a more detailed format. Both can be pursued in parallel. Both are agnostic to underlying APIs. The above discussion about STA/EDR shuold be taken up once we get to the point that we're piloting something about interoperability between different data providers. I don't think STA/EDR should inform the design of either format. |
@ksonda On your statement on 2 encodings, one aligned with Coverage, one with the concepts from OMS, if you take the time to look at the TSML model, you'll see that it's always had 2 approaches, Coverage and TimeValue Pairs. While I understand your logic in trying to integrate all concepts from OMS into CovJSON, this would require:
The approach I've been working on with Paul Hershberg foresees the following encodings (with links between the simple and complex views, allowing a user to switch between the approaches depending on what data they require):
However, as we don't want to be tightly bound to encodings, it looks like we'll first be creating a logical model that takes some complexity out of the conceptual model (e.g. going to soft typing, thus avoiding all the specializations you see in the conceptual model ). Then we can figure out what concrete encodings and APIs we define. Admittedly, STA requests can become complex (with great power comes great responsibility! ;) ), probably valuable to provide some standard requests. The most dependable STA I have available is the one with EU Air Quality, there you can get everything you need including location with the following request: What are you missing? |
Im not trying to link all concepts in OMS to covJSON anymore. The EDR spec says you can provide any encoding you want, as long as you specify what it is in the API description document. I somewhat agree in principle with this
But the very nature of the second bullet point to me suggests that a "complex view" encoding needs to be specified agnostic to EDR, STA/ CS/ OA-SOSA (or anything else). For that matter, I don't think the "webby" view needs to be dependent on EDR. covJSON, with some minimal best practice addendum that allows for station name/location metadata, would be enough for the webby view. If I am some international or national entity trying to aggregate "complex view" streamgage data from > 5 or so subnational data providers, it's going to be more feasible for me to as ask that each provider set up an arbitrary mechanism that makes sense for them, to give me a consistent encoding that is supported by multiple open API standards (and possible to be implemented by vendors' APIs, or just custom workflows that host documents in some web-available folder)) than for me to ask that each entity set up an STA endpoint or TBD OA-SOSA endpoint. I think its fine to define a logical model ahead of an encoding, but don't want to lose sight of the fact that many clients and workflows would expect a standard encoding or at least an encoding that is well specified in an API definition document. I agree that in general, any number of STA queries would give the information one might expect in a "complex view". I just want the "complex view", in a consistent encoding, to be able to be made available from STA, and EDR, and custom APIs, and non-API modes of data exchange. I'm not missing anything content wise from that STA request (supposing the provider has all the metadata in there). I just think it's putting the cart before the horse. I want there to be a standard encoding (that is JSON but not necessarily covJSON) such that The problem is more that, as you said, you'd need that to be a "standard request" to ensure that such a requested resultFormat from STA would actually give you all that info. And it's an open question how you would document in a machine-readable way |
Which I interpret as : with no element specific from the API providing it. Still we haven’t made progress on it -> why ? because it’s super hard to get rid of all the APIs’ specific « decorations » ? |
I would say that what is described here is actually a third option next to the "Webby" and "Complex" views: a standardised export format. One problem is to define what needs to go into this format, and how one specifies which sub-selection of all data in a service one is interested in, or what the default "selection" is. Since there is no possibility to add nextLinks or other forms of pagination, there has to be some other way to keep the files at a manageable size. The second problem is that these will, by their nature, be quite verbose, and contain much duplicate data. Since each file will have to be self-contained. I do think that TSML is a very good candidate for this. As would be a potential JSON Encoding for OMS. |
+1 on standardized (export) format. I think the separation of encoding and API is key to enable publication of hydrological observation data either without having to set up one/several standard web services or re-using existing, non-standard web services (which will be the majority of data providers in the foreseeable future). With respect to WIS2 and WIS2Box, the idea is that the National Met Service runs the WIS2box node and the data provider, e.g. National Hydrological Service would provide file-based metadata and observation data that is then translated into the standard-compliant metadata and observation data file(s). WIS2 uses GeoJSON discovery metadata at the dataset-level (using WCMP 2, https://wmo-im.github.io/wcmp2/standard/wcmp2-DRAFT.html) linking to the data, either served as files from a WAF or through APIs. For the APIs there are templated links (https://wmo-im.github.io/wcmp2/standard/wcmp2-DRAFT.html#_1_19_2_templated_links) that can be used to guide data access. Not sure whether this would cover @ksonda thoughts on separation and documentation but could serve as a starting point. |
We have two things floating around in this thread -- one that I think is not being stated outright and want to clarify. What we are saying out loud: The distinction between Web use cases, where convenience for common Web use cases is paramount, and data exchange use cases, where precision of observation documentation is the primary (but still not dogmatically important) objective. At the end of the day, there is a adoption dynamic that we must confront. The Web will not wait for the standards community but data exchange has requirements that necessitate more care in standardization. What is discussed above without naming the design consideration is the distinction between APIs that allow you to define your document structure via API constructs and those that provide domain filtering only. The pattern of tight coupling of APIs to payload is super useful in some contexts (see the explosion of graphql and odata) but creates considerable hurdles when trying to pursue broad interoperability at the community (e.g. hydroscience) and cross-community (e.g. emergency response) levels. As Kyle illustrated, it's unrealistic to expect every hydro-met service to adopt the same coupled API/payload pattern in pursuit of community-level interoperability. The pattern of decoupling content from API is limiting in that you get what you get and can't (without anti-patterns or overloads) restrict the content returned. It's a fair critique -- but the separation of concerns and resulting architectural freedom it creates is worth every penny of sacrifice.
We can still offer "lighweight" documents through extended format lists (e.g.
This is all to say that it is critical that we seek out an arrangement where the resource model we define as our interoperability target is not coupled to the Web API that we are going to use to filter the parameter and spatiotemporal range of data. By extension, that interoperability target needs to be holistic. Within that, if we are going to capture Web use cases, we can't load up a document with stuff that will slow down or confuse Web developers and their applications (Webby) and we also need to define a way for those who need a fuller picture to share and / or access that picture in a precise and complete form. There's been quite a lot of piling on to this thread -- Can someone suggest another venue to continue this discussion? Perhaps the TSML github? We should yield the floor to @tomkralidis and @webb-ben to continue as was laid out here: #703 (comment) |
Add pipeline(s) to:
Notes:
The text was updated successfully, but these errors were encountered: