Skip to content

Refactoring

Past due by 11 months 81% complete

General Guide

Since it is deployed directly from the repository, master branch will stay.
We will have a separate main branch and will merge refactored code into main.
Once we finish an integration tests and confirms that it is doing at least the same job as the current implementation,
we can set the main as a new master.

TL;DR: Open a pull request into main

General Guide

Since it is deployed directly from the repository, master branch will stay.
We will have a separate main branch and will merge refactored code into main.
Once we finish an integration tests and confirms that it is doing at least the same job as the current implementation,
we can set the main as a new master.

TL;DR: Open a pull request into main, not into master.

Higher Level TODOs

  • Create a package scicat_ingestor and gradually move all the existing code to the package.
    It not for deployment so it just needs to be accessible to the entry point script (The command or a script that is used for
    deployment)
  • Setting up smaller scope tests environments
  • Setting up automated continuous integration tests environments
  • Update documentation
  • Turn main branch the new master branch and archive master

Brainstorming what should be done for refactoring.
Feel free to shoot down the ideas : D

Encapsulating partial code into modules/functions

For example, these can be split into another module...?
https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/online_ingestor.py#L116-L124

Kafka Consumer/Message Handling

As it is carrying the core-logic, we maybe better wrap the confluent kafka interfaces so that we can easily test them as well...?
For example,
https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/online_ingestor.py#L71-L77
This kind of error-handling can be separated into a smaller function.
Or
Wrap this into a function or automate this somehow...?
https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/online_ingestor.py#L32-L40

Logging

Logging seems like to require some configuration steps but despite of the configuration the logging interface logger.log(info/debug/error) should be the same.
But since the configuration of the logger requires file or link handling, it'd better be tested I guess...?

Ingestor

This guy definitely seems like to need some chopping.
https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/ingestor_lib.py#L222

For example,
I think this should be wrapped into a separate file handling interface so that we can test them.
https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/ingestor_lib.py#L258-L261

Redefining Interfaces

Nexus structure path

https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/ingestor_lib.py#L28-L33

This mixture of filter and indexing can maybe refactored into more explicit way...?
But I also think it is already working conveniently already... hmm...

Ingestor

Documentation / Docstrings

Maybe we can collect the comments into docstrings. Copilot is good at it : D
And the documentation in the readme might be sufficient but we probably need to clean that up too.
Here is a list of some sections that might be needed in the documentation.

  • Deployment Environment/Method
  • Whole structure (briefly, maybe mermaid graph : D)
  • Logging options
  • API dependencies
  • Kafka Consumer/Message Handling
    It seems quite straight forward but since it's the core part of it. Maybe we can document it better.
    Shall we write some mermaid graph : D....??
Loading