-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ETL Viewer #1485
base: xdmod10.5
Are you sure you want to change the base?
ETL Viewer #1485
Conversation
@ryanrath, I am excited to hear about this feature...super cool. |
Here's a super rough concept that I worked on for a way of visualizing info for the ETL Viewer. The big problem that I've run into so far w/ this method has been working out how to automatically layout the elements for any given pipeliine. I manually selected the data for this image to illustrate the basic concept of what I was looking to achieve but it turns out that working out how to get cytoscape to lay things out in this 3 column manner is somewhat... challenging to say the least. Ultimately I think that I'd need to write a custom layout engine ( which cytoscape supports ) if this was something we wanted to move forward with. If anybody has any other ideas / thoughts please hit me up as I would really love to get further input on this. |
4dd0662
to
258a82f
Compare
In PHP 7.2 it's required that only variables are allowed to be passed by ref. This set of changes extracts where we were passing the results of a function by-ref into a variable and then passing that.
PHP 7.2 throws an exception now when `count` is used with something that does not implement `Countable`. The `UserOrganizationTest` changes should have been made previously as the `execute` function is documented as returning an integer. The `Utilities` change was necessitated by `args` sometimes not being a countable.
- Column.php: There was a problem w/ the way with datetime columns w/ default values were having their column statements constructed. Namely, they were having the default value wrapped in single quotes, which is causes problems in MariaDB 10.3.17. - Table.php: This change was necessitated because MariaDB diverged from MySQL in how nulls are stored in the information schema. - cloud_ingest_resource_specs.json: MariaDB datetime columns have a default format that cannot be changed that was preventing this data from being ingested. This change formats the incoming data appropriately. - staging.json: *See cloud_ingest_resource_specs.json - **/schema-version-history.json: This was something that probably never worked but just hadn't been caught because errors were not thrown. We've removed the `action_datetime` column from these data files and updated their associated tables to have a default value of `CURRENT_TIMESTAMP()`. Which works now because of the changes made to Column.php & Table.php. - usage.schema.json: This change is, like the cloud_ingest_resource_specs.json change, due to the default format for MariaDB datetime columns.
Not sure why this wasn't caught before but the values being inserted into this column included values that were too large for `int`. After speaking with Greg we decided the easiest fix for now would be to update the column type to `bigint`.
- **/*-aggregation*.json: The sql statements constructed from this configuration files were throwing a division by zero error. To resolve this problem an `IF` statement was added that ensured the `task.wallduration` value was always at least 1. - post_ingest_update.sql: This statement wasn't operating as intended. After some investigation I found that the `INTERVAL 1 SECOND` was not supported and after discussion w/ Greg and some testing, changed it to `- 1` as the column values are already in seconds.
The original value of `NOW()` was being interpreted as a literal string value as opposed to being treated as a function call, this was causing user creation and updates to fail due to an incompatible value being provided for the `password_last_updated` column. Replacing this w/ a php call to `date` means that we provide a valid value in the form `Y-m-d H:i:s`. **NOTE: there will be a follow on PR that refactors the `getUpdateQuery` and `getInsertQuery` functions into a single `getQuery` function that greatly simplifies & makes more readable the code responsible for generating the required SQL Statement for creating / updating a user.**
All of these changes are due to the versions of software being updated on Centos8 vs. Centos7. - normalized_table_definition.json: Default values are now being returned as single quoted strings now in MariaDB 10.3.17. - UsageExplorerTest.php: w/ PHP 7.2 installed these response header values have changed and needed to be updated. - RegressionTestHelper.php: w/ PHP 7.2 installed, when dealing with exported CSV data we sometimes encounter and expect JSON data instead. For instance, when a user requests data that they do not have access to a JSON object will be returned. Unfortunately we compare this JSON data as strings and not as objects / arrays. This coupled with JSON Pretty Print not including a new line after the opening square bracket of an empty array (example below ): ```json { "property1": "", "property2": [ ] } ``` Versus ```json { "property1": "", "property2": [] } ``` This meant that these tests were failing. I've just added a secondary test if the intitial `$expected === $csvdata` condition fails that tests if $expected / $csvdata are actually json data.
These changes were made so that the ETL tests produce the same logging as we expected previously.
Just bumping the version of chromedriver available w/ Centos8
Sooo `self::createLogger` doesn't actually return anything so there's no point in setting `self::$logger = nothing`.
MariaDB 10.2+ started reporting default values as quoted which broke the ETL table creation / modification code. These changes detect if there are quotes included and strips them, thus allowing our code to work again.
These changes only worked on MariaDB 10.0+ ( or MySQL 5.6.5 ) as that's when MariaDB allowed `datetime` columns to have default values of `NOW()` or `CURRENT_TIMESTAMP()`.
These changes allow our RPM to be built for / installed on either Centos7 or 8.
So upon further testing it appears that `finfo_buffer` w/ `FILEINFO_MIME` returns different values depending on whether it's PHP7 or 8. These changes allow our tests to account for these differences.
The content_type reported by PHP7 for xml is `text/xml` while in PHP8 it's reported as `text/xml;charset=UTF-8`. There are two ways we could handle this, one is with the changes included in this commit. The other is that we can update `ExportBuilder.php::$supported_formats["xml"]["render_as"]` to include `charset=UTF-8`.
…urce - Updated the url for the TreeGrid to point at `/etl/pipelines/actions` - Upated the `getActionsforPipelines` method chain so that it produces output suitable for use in an ExtJS TreeGrid. - The change @ ubccr#201 was needed due to `get_object_vars` returning a keyed array `<property> => <value>`. - ubccr#289: This function preps the output for use use in an EXtJS TreeGrid. - ubccr#453: This was simplify / declutter the display of the action name from it's fully qualified form `xdmod.pipeline.action` to `action` as the nodes immediately preceeding the action node will already be displaying the full pipeline name.
Just adding Cytoscape & Dagre layout engine that we'll be using for the visualization of Pipeline's / Actions.
This is to support a keyboard shortcut for expanding or collapsing an entire tree item in the ETL Viewer.
- removed the File Select / Search Panel as this isn't being used atm. - Added the ability for users to shift+click a tree node and either recursively expanding or collapsing all of the child nodes. - Added the ability for the ETLViewer's tree ( Ext.ux.tree.TreeGrid ) to automatically expand a column to take up the remaining space via the `autoExpandColumn` property.
Made some good progress on the Graph Panel and I don't want to lose it.
Have things in a enough of a shape for people to get an idea of how things work
Ultimately I don't think we're going to want to filter the TreeView in place due to the asynchronous nature of the loading. We can hide nodes that haven't been shown yet, but it's proving difficult to show them after they've been hidden. Not only that, but viewing the subsequently filtered data is problematic, again due to the asynchronous nature of the tree. A user can open each pipeline to see which nodes matched underneath, but they'd need to click each one individually. I attempted to add an 'expandAll' after the filter process but even with a number of nodes hidden it takes a loooong time to finish ( upwards of 5 seconds or more ). I think that a more useful display for a search would be a table w/ a minimum of two columns, one that's the path to the node / attribute that matches the search term and the other being the actual value that matched. We could then maybe provide some context actions of navigating directly to the node in the TreeView. This would be relatively easy as the tree already has an `expandPath` function. There could also be a `Open Pipeline | Action` feature that opens the pipeline / action in the Graph View.
**NOTE: this PR also includes the Centos 8 updates. Which reminds me that I need to upload the centos8 docker image I'm using so that people can check actually try this out. I'll be trying to upload that image overnight so hopefully it'll be up and ready by Monday ( 2021.01.18 ) if people are interested in taking a look. **
NOTE the 2nd: If anybody has any features / functionality they would like to see incorporated please let me know
Description
I figured it was about time to get what I have for this up so that other people can take a look at it. You'll need to spin this up in a docker & then log in to the admin interface & you should see a new tab "ETL Viewer". Right now it just displays every pipeline / action / action details in a TreeGrid. You can ignore the "File" drop down on the right side of the interface. In the beginning I had it in my head that we'd want to start looking at things on a file by file basis, hence this dropdown. But the code really isn't setup to do this at all so I just pivoted to viewing the final data structures produced by the ETLConfiguration class.
I think that an easy next step would be to add a "Search" box that maybe searches each node & expands the nodes that match? Or it could be that it just returns all the matches and which pipeline / action it was found in. After which , if you click a search result, it will expand / take you to the node you selected in the overall data structure?
Motivation and Context
The ETL Config system is pretty spiff, but it can be hard to visualize / figure out where everything is / how it goes together.
Ultimately I want to add a visualization that makes it easy to understand where data is coming from & where it's going for each pipeline / action. I'll be adding a couple of screenshots of what I've experimented with so far.
Tests performed
Checklist: