ETL Viewer #1485

ryanrath · 2021-01-15T16:41:42Z

**NOTE: this PR also includes the Centos 8 updates. Which reminds me that I need to upload the centos8 docker image I'm using so that people can check actually try this out. I'll be trying to upload that image overnight so hopefully it'll be up and ready by Monday ( 2021.01.18 ) if people are interested in taking a look. **

NOTE the 2nd: If anybody has any features / functionality they would like to see incorporated please let me know

Description

I figured it was about time to get what I have for this up so that other people can take a look at it. You'll need to spin this up in a docker & then log in to the admin interface & you should see a new tab "ETL Viewer". Right now it just displays every pipeline / action / action details in a TreeGrid. You can ignore the "File" drop down on the right side of the interface. In the beginning I had it in my head that we'd want to start looking at things on a file by file basis, hence this dropdown. But the code really isn't setup to do this at all so I just pivoted to viewing the final data structures produced by the ETLConfiguration class.

I think that an easy next step would be to add a "Search" box that maybe searches each node & expands the nodes that match? Or it could be that it just returns all the matches and which pipeline / action it was found in. After which , if you click a search result, it will expand / take you to the node you selected in the overall data structure?

Motivation and Context

The ETL Config system is pretty spiff, but it can be hard to visualize / figure out where everything is / how it goes together.
Ultimately I want to add a visualization that makes it easy to understand where data is coming from & where it's going for each pipeline / action. I'll be adding a couple of screenshots of what I've experimented with so far.

Tests performed

Checklist:

The pull request description is suitable for a Changelog entry
The milestone is set correctly on the pull request
The appropriate labels have been added to the pull request

jsperhac · 2021-01-15T19:08:39Z

@ryanrath, I am excited to hear about this feature...super cool.

ryanrath · 2021-01-15T20:05:42Z

Here's a super rough concept that I worked on for a way of visualizing info for the ETL Viewer.

The big problem that I've run into so far w/ this method has been working out how to automatically layout the elements for any given pipeliine. I manually selected the data for this image to illustrate the basic concept of what I was looking to achieve but it turns out that working out how to get cytoscape to lay things out in this 3 column manner is somewhat... challenging to say the least. Ultimately I think that I'd need to write a custom layout engine ( which cytoscape supports ) if this was something we wanted to move forward with.

If anybody has any other ideas / thoughts please hit me up as I would really love to get further input on this.

In PHP 7.2 it's required that only variables are allowed to be passed by ref. This set of changes extracts where we were passing the results of a function by-ref into a variable and then passing that.

PHP 7.2 throws an exception now when `count` is used with something that does not implement `Countable`. The `UserOrganizationTest` changes should have been made previously as the `execute` function is documented as returning an integer. The `Utilities` change was necessitated by `args` sometimes not being a countable.

- Column.php: There was a problem w/ the way with datetime columns w/ default values were having their column statements constructed. Namely, they were having the default value wrapped in single quotes, which is causes problems in MariaDB 10.3.17. - Table.php: This change was necessitated because MariaDB diverged from MySQL in how nulls are stored in the information schema. - cloud_ingest_resource_specs.json: MariaDB datetime columns have a default format that cannot be changed that was preventing this data from being ingested. This change formats the incoming data appropriately. - staging.json: *See cloud_ingest_resource_specs.json - **/schema-version-history.json: This was something that probably never worked but just hadn't been caught because errors were not thrown. We've removed the `action_datetime` column from these data files and updated their associated tables to have a default value of `CURRENT_TIMESTAMP()`. Which works now because of the changes made to Column.php & Table.php. - usage.schema.json: This change is, like the cloud_ingest_resource_specs.json change, due to the default format for MariaDB datetime columns.

Not sure why this wasn't caught before but the values being inserted into this column included values that were too large for `int`. After speaking with Greg we decided the easiest fix for now would be to update the column type to `bigint`.

- **/*-aggregation*.json: The sql statements constructed from this configuration files were throwing a division by zero error. To resolve this problem an `IF` statement was added that ensured the `task.wallduration` value was always at least 1. - post_ingest_update.sql: This statement wasn't operating as intended. After some investigation I found that the `INTERVAL 1 SECOND` was not supported and after discussion w/ Greg and some testing, changed it to `- 1` as the column values are already in seconds.

The original value of `NOW()` was being interpreted as a literal string value as opposed to being treated as a function call, this was causing user creation and updates to fail due to an incompatible value being provided for the `password_last_updated` column. Replacing this w/ a php call to `date` means that we provide a valid value in the form `Y-m-d H:i:s`. **NOTE: there will be a follow on PR that refactors the `getUpdateQuery` and `getInsertQuery` functions into a single `getQuery` function that greatly simplifies & makes more readable the code responsible for generating the required SQL Statement for creating / updating a user.**

All of these changes are due to the versions of software being updated on Centos8 vs. Centos7. - normalized_table_definition.json: Default values are now being returned as single quoted strings now in MariaDB 10.3.17. - UsageExplorerTest.php: w/ PHP 7.2 installed these response header values have changed and needed to be updated. - RegressionTestHelper.php: w/ PHP 7.2 installed, when dealing with exported CSV data we sometimes encounter and expect JSON data instead. For instance, when a user requests data that they do not have access to a JSON object will be returned. Unfortunately we compare this JSON data as strings and not as objects / arrays. This coupled with JSON Pretty Print not including a new line after the opening square bracket of an empty array (example below ): ```json { "property1": "", "property2": [ ] } ``` Versus ```json { "property1": "", "property2": [] } ``` This meant that these tests were failing. I've just added a secondary test if the intitial `$expected === $csvdata` condition fails that tests if $expected / $csvdata are actually json data.

These changes were made so that the ETL tests produce the same logging as we expected previously.

Just bumping the version of chromedriver available w/ Centos8

Sooo `self::createLogger` doesn't actually return anything so there's no point in setting `self::$logger = nothing`.

MariaDB 10.2+ started reporting default values as quoted which broke the ETL table creation / modification code. These changes detect if there are quotes included and strips them, thus allowing our code to work again.

These changes only worked on MariaDB 10.0+ ( or MySQL 5.6.5 ) as that's when MariaDB allowed `datetime` columns to have default values of `NOW()` or `CURRENT_TIMESTAMP()`.

These changes allow our RPM to be built for / installed on either Centos7 or 8.

So upon further testing it appears that `finfo_buffer` w/ `FILEINFO_MIME` returns different values depending on whether it's PHP7 or 8. These changes allow our tests to account for these differences.

The content_type reported by PHP7 for xml is `text/xml` while in PHP8 it's reported as `text/xml;charset=UTF-8`. There are two ways we could handle this, one is with the changes included in this commit. The other is that we can update `ExportBuilder.php::$supported_formats["xml"]["render_as"]` to include `charset=UTF-8`.

…urce - Updated the url for the TreeGrid to point at `/etl/pipelines/actions` - Upated the `getActionsforPipelines` method chain so that it produces output suitable for use in an ExtJS TreeGrid. - The change @ ubccr#201 was needed due to `get_object_vars` returning a keyed array `<property> => <value>`. - ubccr#289: This function preps the output for use use in an EXtJS TreeGrid. - ubccr#453: This was simplify / declutter the display of the action name from it's fully qualified form `xdmod.pipeline.action` to `action` as the nodes immediately preceeding the action node will already be displaying the full pipeline name.

Just adding Cytoscape & Dagre layout engine that we'll be using for the visualization of Pipeline's / Actions.

This is to support a keyboard shortcut for expanding or collapsing an entire tree item in the ETL Viewer.

- removed the File Select / Search Panel as this isn't being used atm. - Added the ability for users to shift+click a tree node and either recursively expanding or collapsing all of the child nodes. - Added the ability for the ETLViewer's tree ( Ext.ux.tree.TreeGrid ) to automatically expand a column to take up the remaining space via the `autoExpandColumn` property.

Made some good progress on the Graph Panel and I don't want to lose it.

Have things in a enough of a shape for people to get an idea of how things work

Ultimately I don't think we're going to want to filter the TreeView in place due to the asynchronous nature of the loading. We can hide nodes that haven't been shown yet, but it's proving difficult to show them after they've been hidden. Not only that, but viewing the subsequently filtered data is problematic, again due to the asynchronous nature of the tree. A user can open each pipeline to see which nodes matched underneath, but they'd need to click each one individually. I attempted to add an 'expandAll' after the filter process but even with a number of nodes hidden it takes a loooong time to finish ( upwards of 5 seconds or more ). I think that a more useful display for a search would be a table w/ a minimum of two columns, one that's the path to the node / attribute that matches the search term and the other being the actual value that matched. We could then maybe provide some context actions of navigating directly to the node in the TreeView. This would be relatively easy as the tree already has an `expandPath` function. There could also be a `Open Pipeline | Action` feature that opens the pipeline / action in the Graph View.

ryanrath added experiment Experimental feature new feature New functionality Category:General General labels Jan 15, 2021

ryanrath added this to the 9.5.0 milestone Jan 15, 2021

ryanrath force-pushed the etl_viewer branch from a1d78e3 to 78af8b6 Compare February 25, 2021 16:08

jpwhite4 modified the milestones: 9.5.0, 10.0.0 Mar 8, 2021

ryanrath force-pushed the etl_viewer branch 2 times, most recently from 4dd0662 to 258a82f Compare March 31, 2021 15:15

jtpalmer changed the base branch from xdmod9.5 to xdmod10.0 June 14, 2021 18:53

Ryan Rathsam added 18 commits July 2, 2021 14:07

PHP 7.2 updates for by-ref function arguments

c1b34f0

In PHP 7.2 it's required that only variables are allowed to be passed by ref. This set of changes extracts where we were passing the results of a function by-ref into a variable and then passing that.

Cloud raw_event table column size update

f4c14e3

Not sure why this wasn't caught before but the values being inserted into this column included values that were too large for `int`. After speaking with Greg we decided the easiest fix for now would be to update the column type to `bigint`.

Updating XDMoD spec file to Centos8 versions

7df9cef

Fixing some additional tests

6a22f2d

These changes were made so that the ETL tests produce the same logging as we expected previously.

Updating chromedriver version to match centos8

6ff838b

Just bumping the version of chromedriver available w/ Centos8

Removing the use of createLogger's non-existant return value

298fd5a

Sooo `self::createLogger` doesn't actually return anything so there's no point in setting `self::$logger = nothing`.

Updating ETL Column Detection SQL for MariaDB 10.2+

ce25485

MariaDB 10.2+ started reporting default values as quoted which broke the ETL table creation / modification code. These changes detect if there are quotes included and strips them, thus allowing our code to work again.

This should ahve been with the previous commit

dd82513

Updating Datetime default values to use triggers

d3d04cb

These changes only worked on MariaDB 10.0+ ( or MySQL 5.6.5 ) as that's when MariaDB allowed `datetime` columns to have default values of `NOW()` or `CURRENT_TIMESTAMP()`.

Reverting this log change as it's not needed

c238b63

Updating XDMoD's spec file to support Centos7&8

99e7739

These changes allow our RPM to be built for / installed on either Centos7 or 8.

Updates to account for PHP7/8 finfo_buffer differences

05bc7be

So upon further testing it appears that `finfo_buffer` w/ `FILEINFO_MIME` returns different values depending on whether it's PHP7 or 8. These changes allow our tests to account for these differences.

ryanrath and others added 21 commits July 2, 2021 14:45

Beginnings of selecting a file to view in ETL Viewer

1f021bb

Exploring adding JsonSerializable to some ETL classes

aee8fdc

Catching up

193a3fd

checkpoint

b2dd002

checkpoint

7c45c24

Adding Cytoscape Dependencies

eeb53f6

Just adding Cytoscape & Dagre layout engine that we'll be using for the visualization of Pipeline's / Actions.

De-PHP7-ifying things since this still has to run on PHP5.4

1e1c1bb

Adding global shift key deteection

3dbcfe6

This is to support a keyboard shortcut for expanding or collapsing an entire tree item in the ETL Viewer.

Checkpointint

53953b0

Made some good progress on the Graph Panel and I don't want to lose it.

Refactoring tab adding

7b11776

Adding Cytoscape deps and the GraphPanel

6f06196

Add support for displaying tables in the Graph Panel

85e5e72

checkpointing

50559c4

Check Pointing

9a2bd43

Have things in a enough of a shape for people to get an idea of how things work

Adding Graph viewing for an Action and fixing libraries

b4cb365

Check Pointing so I don't lose any work

1475eff

checkpoint

b787d9f

Checkpoint on working Tree Search

ffff064

ryanrath force-pushed the etl_viewer branch from abac182 to bdec23e Compare July 2, 2021 18:52

Updating docker images used

84552e5

ryanrath modified the milestones: 10.0.0, 10.5.0 Jan 13, 2022

jtpalmer changed the base branch from xdmod10.0 to xdmod10.5 March 24, 2022 14:50

jpwhite4 modified the milestones: 10.5.0, 11.0.0 May 24, 2023

jpwhite4 modified the milestones: 11.0.0, 11.5.0 Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETL Viewer #1485

ETL Viewer #1485

ryanrath commented Jan 15, 2021 •

edited

Loading

jsperhac commented Jan 15, 2021

ryanrath commented Jan 15, 2021

ETL Viewer #1485

Are you sure you want to change the base?

ETL Viewer #1485

Conversation

ryanrath commented Jan 15, 2021 • edited Loading

Description

Motivation and Context

Tests performed

Checklist:

jsperhac commented Jan 15, 2021

ryanrath commented Jan 15, 2021

ryanrath commented Jan 15, 2021 •

edited

Loading