You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for writing har2warc. I will describe below what I've tried and some minor differences in what I was expecting that I don't know how to explain.
I pulled a docker image of splash created by @scrapinghub like so and ran it:
And after this I've imported the 1.warc file into webrecoreder.
Then I viewed that file as it was stored in webrecorder and any styling seemed to be missing.
I understand and agree that this does not just involve har2warc, and the problem could originate in one of these: har2warc , splash , webrecorder . I'm not sure where to attribute this behaviour.
The general use-case would be automating a large archiving operation where the result would be a faithful reproduction of the original website, if such a website happens to contain a lot of javascript-rendered content, and nowadays that is the case with many websites.
I'd be interested in your thoughts.
Thanks,
Stefan
The text was updated successfully, but these errors were encountered:
Me again, I was able to isolate the problem to splash, I made the following PR #821. Using that change, the pipeline splash -> har2warc -> webrecorder is now fully functional, all images and styling is showing up.
I'm going to close this issue.
Hi,
Thank you for writing har2warc. I will describe below what I've tried and some minor differences in what I was expecting that I don't know how to explain.
I pulled a docker image of splash created by @scrapinghub like so and ran it:
Then I rendered a page using splash and exported the resulting .har (as indicated in splash's docs):
Then I've converted the resulting .har to .warc
And after this I've imported the 1.warc file into webrecoreder.
Then I viewed that file as it was stored in webrecorder and any styling seemed to be missing.
I understand and agree that this does not just involve har2warc, and the problem could originate in one of these: har2warc , splash , webrecorder . I'm not sure where to attribute this behaviour.
The general use-case would be automating a large archiving operation where the result would be a faithful reproduction of the original website, if such a website happens to contain a lot of javascript-rendered content, and nowadays that is the case with many websites.
I'd be interested in your thoughts.
Thanks,
Stefan
The text was updated successfully, but these errors were encountered: