Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETL Viewer #1485

Draft
wants to merge 76 commits into
base: xdmod10.5
Choose a base branch
from
Draft

ETL Viewer #1485

wants to merge 76 commits into from

Conversation

ryanrath
Copy link
Contributor

@ryanrath ryanrath commented Jan 15, 2021

**NOTE: this PR also includes the Centos 8 updates. Which reminds me that I need to upload the centos8 docker image I'm using so that people can check actually try this out. I'll be trying to upload that image overnight so hopefully it'll be up and ready by Monday ( 2021.01.18 ) if people are interested in taking a look. **

NOTE the 2nd: If anybody has any features / functionality they would like to see incorporated please let me know

Description

I figured it was about time to get what I have for this up so that other people can take a look at it. You'll need to spin this up in a docker & then log in to the admin interface & you should see a new tab "ETL Viewer". Right now it just displays every pipeline / action / action details in a TreeGrid. You can ignore the "File" drop down on the right side of the interface. In the beginning I had it in my head that we'd want to start looking at things on a file by file basis, hence this dropdown. But the code really isn't setup to do this at all so I just pivoted to viewing the final data structures produced by the ETLConfiguration class.

I think that an easy next step would be to add a "Search" box that maybe searches each node & expands the nodes that match? Or it could be that it just returns all the matches and which pipeline / action it was found in. After which , if you click a search result, it will expand / take you to the node you selected in the overall data structure?

Motivation and Context

The ETL Config system is pretty spiff, but it can be hard to visualize / figure out where everything is / how it goes together.
Ultimately I want to add a visualization that makes it easy to understand where data is coming from & where it's going for each pipeline / action. I'll be adding a couple of screenshots of what I've experimented with so far.

Tests performed

Checklist:

  • The pull request description is suitable for a Changelog entry
  • The milestone is set correctly on the pull request
  • The appropriate labels have been added to the pull request

@ryanrath ryanrath added experiment Experimental feature new feature New functionality Category:General General labels Jan 15, 2021
@ryanrath ryanrath added this to the 9.5.0 milestone Jan 15, 2021
@jsperhac
Copy link
Contributor

@ryanrath, I am excited to hear about this feature...super cool.

@ryanrath
Copy link
Contributor Author

Here's a super rough concept that I worked on for a way of visualizing info for the ETL Viewer.

etl_viewer_visualizer

The big problem that I've run into so far w/ this method has been working out how to automatically layout the elements for any given pipeliine. I manually selected the data for this image to illustrate the basic concept of what I was looking to achieve but it turns out that working out how to get cytoscape to lay things out in this 3 column manner is somewhat... challenging to say the least. Ultimately I think that I'd need to write a custom layout engine ( which cytoscape supports ) if this was something we wanted to move forward with.

If anybody has any other ideas / thoughts please hit me up as I would really love to get further input on this.

@jpwhite4 jpwhite4 modified the milestones: 9.5.0, 10.0.0 Mar 8, 2021
@ryanrath ryanrath force-pushed the etl_viewer branch 2 times, most recently from 4dd0662 to 258a82f Compare March 31, 2021 15:15
@jtpalmer jtpalmer changed the base branch from xdmod9.5 to xdmod10.0 June 14, 2021 18:53
Ryan Rathsam added 18 commits July 2, 2021 14:07
In PHP 7.2 it's required that only variables are allowed to be passed by ref.
This set of changes extracts where we were passing the results of a function
by-ref into a variable and then passing that.
PHP 7.2 throws an exception now when `count` is used with something that does
not implement `Countable`. The `UserOrganizationTest` changes should have been
made previously as the `execute` function is documented as returning an integer.
The `Utilities` change was necessitated by `args` sometimes not being a
countable.
- Column.php: There was a problem w/ the way with datetime columns w/ default
  values were having their column statements constructed. Namely, they were
  having the default value wrapped in single quotes, which is causes problems in
  MariaDB 10.3.17.
- Table.php: This change was necessitated because MariaDB diverged
  from MySQL in how nulls are stored in the information schema.
- cloud_ingest_resource_specs.json: MariaDB datetime columns have a default
  format that cannot be changed that was preventing this data from being
  ingested. This change formats the incoming data appropriately.
- staging.json: *See cloud_ingest_resource_specs.json
- **/schema-version-history.json: This was something that probably never worked
  but just hadn't been caught because errors were not thrown. We've removed the
  `action_datetime` column from these data files and updated their associated
  tables to have a default value of `CURRENT_TIMESTAMP()`. Which works now
  because of the changes made to Column.php & Table.php.
- usage.schema.json: This change is, like the cloud_ingest_resource_specs.json
  change, due to the default format for MariaDB datetime columns.
Not sure why this wasn't caught before but the values being inserted into this
column included values that were too large for `int`. After speaking with Greg
we decided the easiest fix for now would be to update the column type to
`bigint`.
- **/*-aggregation*.json: The sql statements constructed from this configuration
  files were throwing a division by zero error. To resolve this problem an `IF`
  statement was added that ensured the `task.wallduration` value was always at
  least 1.
- post_ingest_update.sql: This statement wasn't operating as intended.
  After some investigation I found that the `INTERVAL 1 SECOND` was not
  supported and after discussion w/ Greg and some testing, changed it to `- 1`
  as the column values are already in seconds.
The original value of `NOW()` was being interpreted as a literal string value
as opposed to being treated as a function call, this was causing user creation
and updates to fail due to an incompatible value being provided for the
`password_last_updated` column. Replacing this w/ a php call to `date` means
that we provide a valid value in the form `Y-m-d H:i:s`.

**NOTE: there will be a follow on PR that refactors the `getUpdateQuery` and
`getInsertQuery` functions into a single `getQuery` function that greatly
simplifies & makes more readable the code responsible for generating the
required SQL Statement for creating / updating a user.**
All of these changes are due to the versions of software being updated on
Centos8 vs. Centos7.

- normalized_table_definition.json: Default values are now being returned as
single quoted strings now in MariaDB 10.3.17.
- UsageExplorerTest.php: w/ PHP 7.2 installed these response header values have
changed and needed to be updated.
- RegressionTestHelper.php: w/ PHP 7.2 installed, when dealing with exported CSV
data we sometimes encounter and expect JSON data instead. For instance, when a
user requests data that they do not have access to a JSON object will be
returned. Unfortunately we compare this JSON data as strings and not as objects
/ arrays. This coupled with JSON Pretty Print not including a new line after the
opening square bracket of an empty array (example below ):

```json
{
    "property1": "",
    "property2": [

    ]
} ```

Versus

```json
{
    "property1": "",
    "property2": []
} ```

This meant that these tests were failing. I've just added a secondary test if
the intitial `$expected === $csvdata` condition fails that tests if $expected /
$csvdata are actually json data.
These changes were made so that the ETL tests produce the same logging as we
expected previously.
Just bumping the version of chromedriver available w/ Centos8
Sooo `self::createLogger` doesn't actually return anything so there's no point
in setting `self::$logger = nothing`.
MariaDB 10.2+ started reporting default values as quoted which broke the ETL
table creation / modification code. These changes detect if there are quotes
included and strips them, thus allowing our code to work again.
These changes only worked on MariaDB 10.0+ ( or MySQL 5.6.5 ) as that's when
MariaDB allowed `datetime` columns to have default values of `NOW()` or
`CURRENT_TIMESTAMP()`.
These changes allow our RPM to be built for / installed on either Centos7 or 8.
So upon further testing it appears that `finfo_buffer` w/ `FILEINFO_MIME`
returns different values depending on whether it's PHP7 or 8. These changes
allow our tests to account for these differences.
The content_type reported by PHP7 for xml is `text/xml` while in PHP8 it's
reported as `text/xml;charset=UTF-8`. There are two ways we could handle this,
one is with the changes included in this commit. The other is that we can update
`ExportBuilder.php::$supported_formats["xml"]["render_as"]` to include
`charset=UTF-8`.
ryanrath and others added 21 commits July 2, 2021 14:45
…urce

- Updated the url for the TreeGrid to point at `/etl/pipelines/actions`
- Upated the `getActionsforPipelines` method chain so that it produces output
  suitable for use in an ExtJS TreeGrid.
  - The change @ ubccr#201 was needed due to `get_object_vars` returning a keyed
    array `<property> => <value>`.
  - ubccr#289: This function preps the output for use use in an EXtJS TreeGrid.
  - ubccr#453: This was simplify / declutter the display of the action name from it's
    fully qualified form `xdmod.pipeline.action` to `action` as the nodes
    immediately preceeding the action node will already be displaying the full
    pipeline name.
Just adding Cytoscape & Dagre layout engine that we'll be using for the
visualization of Pipeline's / Actions.
This is to support a keyboard shortcut for expanding or collapsing an entire
tree item in the ETL Viewer.
- removed the File Select / Search Panel as this isn't being used atm.
- Added the ability for users to shift+click a tree node and either recursively
  expanding or collapsing all of the child nodes.
- Added the ability for the ETLViewer's tree ( Ext.ux.tree.TreeGrid ) to
  automatically expand a column to take up the remaining space via the
  `autoExpandColumn` property.
Made some good progress on the Graph Panel and I don't want to lose it.
Have things in a enough of a shape for people to get an idea of how things work
Ultimately I don't think we're going to want to filter the TreeView in place
due to the asynchronous nature of the loading. We can hide nodes that haven't
been shown yet, but it's proving difficult to show them after they've been
hidden.

Not only that, but viewing the subsequently filtered data is problematic, again
due to the asynchronous nature of the tree. A user can open each pipeline to see
which nodes matched underneath, but they'd need to click each one individually.
I attempted to add an 'expandAll' after the filter process but even with a
number of nodes hidden it takes a loooong time to finish ( upwards of 5 seconds
or more ).

I think that a more useful display for a search would be a table w/ a minimum of
two columns, one that's the path to the node / attribute that matches the search
term and the other being the actual value that matched. We could then maybe
provide some context actions of navigating directly to the node in the
TreeView. This would be relatively easy as the tree already has an `expandPath`
function.

There could also be a `Open Pipeline | Action` feature that opens the pipeline /
action in the Graph View.
@ryanrath ryanrath modified the milestones: 10.0.0, 10.5.0 Jan 13, 2022
@jtpalmer jtpalmer changed the base branch from xdmod10.0 to xdmod10.5 March 24, 2022 14:50
@jpwhite4 jpwhite4 modified the milestones: 10.5.0, 11.0.0 May 24, 2023
@jpwhite4 jpwhite4 modified the milestones: 11.0.0, 11.5.0 Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:General General experiment Experimental feature new feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants