You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* postprocess: corrected some smells
* postprocess: renamed some variables and corrected forloop variables
* postprocess: postprocessItem args
* postprocess: never set the state of the parent before adding a child, this is done via AddChild() method
* item: reinforced CheckConsistency method
* global: enforcing stricter state and consistency check for items throughout stages in the pipeline
* item: corrected CheckConsistency() and made more unit tests
* item&finisher: make use of CompleteAndCheck() method on an item to parse the tree before handling further
* item: CompleteAndCheck() overlooked return conditions
* pre/postprocess: trying to fix the flow of childs
* dumper: add a Dump() function to properly dump an Item for further debugging
* preprocessor: correct exclusion logic
* item.Dedupe: corrected an edge case where a completed child has the same URL as the seed and dedupe was trying to remove the seed
* postprocess: correct failed outlink extraction behaviour
* Add more detailed pyroscope information
* postprocess: add more debug logging to troubleshoot an unknown bug
* preprocess: add itemId in panic
* postprocess: always postprocess an item EVEN IF ASSETS CAPTURE IS DISABLED
* archiver: close spooledBuffer if error happened during body processing
* postprocess: close all bodies of an item tree before continuing in the pipeline
* archiver: try to write bodies only on disk
* add: small memory optimization for URLToString & encodeQuery
* chore: upgrade Go version & dependencies
* chore: bump warc lib to v.0.8.62
* fix: usage of spooledtempfile lib
* chore: bump warc lib to v.0.8.63
* postprocess: defer a closeBodies call on every item that goes through
* log: disable log queue full error message when TUI is used
* cmd: add no-stderr-log flag
* hq.consumer: replace previousBatch check with a reactor duplicate check
* pyroscope: bump upload rate from 15s to 5s
* fix: add panic for errors in startPipeline, retry indefinitely on HQ start error
* fix: not returning when hq.Start fails to init HQ client
* fix: typo
* fix: HQ Start failure marking init as already done
* fix: panic when HQ init fails
* add: truthsocial.com preprocessing & post-processing
* chore: bump warc lib to v.0.8.64
* add: more truthsocial.com special handling
* add: more truthsocial.com special handling
* add: more truthsocial.com special handling
* fix: variable scope for truthsocial special handling
* fix: domains crawl
* fix: set assets hops to their seed hop
* fix: extraction of outlinks on assets
---------
Co-authored-by: Jake L <[email protected]>
Co-authored-by: Corentin Barreau <[email protected]>
0 commit comments