General Guide

Since it is deployed directly from the repository, master branch will stay.
We will have a separate main branch and will merge refactored code into main.
Once we finish an integration tests and confirms that it is doing at least the same job as the current implementation,
we can set the main as a new master.

TL;DR: Open a pull request into main

General Guide

Since it is deployed directly from the repository, master branch will stay.
We will have a separate main branch and will merge refactored code into main.
Once we finish an integration tests and confirms that it is doing at least the same job as the current implementation,
we can set the main as a new master.

TL;DR: Open a pull request into main, not into master.

Higher Level TODOs

Create a package scicat_ingestor and gradually move all the existing code to the package.
It not for deployment so it just needs to be accessible to the entry point script (The command or a script that is used for
deployment)
Setting up smaller scope tests environments
Setting up automated continuous integration tests environments
Update documentation
Turn main branch the new master branch and archive master

Brainstorming what should be done for refactoring.
Feel free to shoot down the ideas : D

Encapsulating partial code into modules/functions

For example, these can be split into another module...?
https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/online_ingestor.py#L116-L124

Kafka Consumer/Message Handling

As it is carrying the core-logic, we maybe better wrap the confluent kafka interfaces so that we can easily test them as well...?
For example,
https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/online_ingestor.py#L71-L77
This kind of error-handling can be separated into a smaller function.
Or
Wrap this into a function or automate this somehow...?
https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/online_ingestor.py#L32-L40

Logging

Logging seems like to require some configuration steps but despite of the configuration the logging interface logger.log(info/debug/error) should be the same.
But since the configuration of the logger requires file or link handling, it'd better be tested I guess...?

Ingestor

This guy definitely seems like to need some chopping.
https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/ingestor_lib.py#L222

For example,
I think this should be wrapped into a separate file handling interface so that we can test them.
https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/ingestor_lib.py#L258-L261

Redefining Interfaces

Nexus structure path

https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/ingestor_lib.py#L28-L33

This mixture of filter and indexing can maybe refactored into more explicit way...?
But I also think it is already working conveniently already... hmm...

Ingestor

Maybe not redefining, but just better type-hinting or wrapping some of the arguments into a tighter data structure.
https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/ingestor_lib.py#L222
Some of if statements can be merged into one if we let get_instrument to accept defaultInstrument like get_nested_value_with_default.
https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/ingestor_lib.py#L384-L392

Documentation / Docstrings

Maybe we can collect the comments into docstrings. Copilot is good at it : D
And the documentation in the readme might be sufficient but we probably need to clean that up too.
Here is a list of some sections that might be needed in the documentation.

Deployment Environment/Method
Whole structure (briefly, maybe mermaid graph : D)
Logging options
API dependencies
Kafka Consumer/Message Handling
It seems quite straight forward but since it's the core part of it. Maybe we can document it better.
Shall we write some mermaid graph : D....??

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring

General Guide

General Guide

Higher Level TODOs

Encapsulating partial code into modules/functions

Kafka Consumer/Message Handling

Logging

Ingestor

Redefining Interfaces

Nexus structure path

Ingestor

Documentation / Docstrings