Replies: 1 comment 1 reply
-
Hey @guykatz - In my experience, I find it most simple to keep things as batch-processing for as long as possible. But this doesn't mean the source data has to be batched. For instance, your devices could emit events to a message broker, like rabbitmq or kafka, and then your orchestrator could continue to process these events every 5/10 seconds by maintaining a cursor indicating the messages that it has read. Similarly, if you continue to do it file-based, this isn't much different than uploading files to some storage and maintaining a cursor of the most recent processed file. However, if you do actually need to process these events near-real time, then it might be more reasonable to stand up a long running process that is subscribed to that topic, and handling them as they come in. Ultimately it will depend on what you are doing with this data down stream, and the time requirements you have for processing it. I'm with you that around the 5s mark is where the lines start to blur. |
Beta Was this translation helpful? Give feedback.
-
Hi all;
i was wondering where do I draw the line between batch and stream data processing.
i assume there is no one definite answer so i want to describe my specific use case:
i have multiple proprietry devices that generate a data file every configurable interval of 5 minutes or 30 seconds. And so, from the get go, the data itself is not presented as a stream by the source as its batched into files.
at some point in the future, i will have to support a 1 sec interval also.
This 1 sec option starts to blur the line between streaming to the destination using some mediator and batching using an orchestrator.
the data is still originated on the source not as a stream (but a file) however now the interval is super small and would bractically mean, in term of orchestrators that I would need to perform a cycle evey second which is not reasonable i believe.
I assume I can microbatch and run every 5 secs for example but would this be fine for orchestrator to handle? i am really not sure.
so I am kind of on the fence on how to deal with this issue and would like to hear opinions and experienced from the community.
thanks!
Beta Was this translation helpful? Give feedback.
All reactions