Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPv6 address representation in WARC-IP-Address field #100

Open
sebastian-nagel opened this issue Nov 20, 2024 · 4 comments
Open

IPv6 address representation in WARC-IP-Address field #100

sebastian-nagel opened this issue Nov 20, 2024 · 4 comments

Comments

@sebastian-nagel
Copy link

This question is about IPv6 address representation in WARC captures.

I'd be in favor of the format specified in RFC5952. But the WARC standard refers to RFC4291 and does not say anything about RFCs superseded or updated by another RFC. Are there any recommendations?

@ato
Copy link
Member

ato commented Nov 21, 2024

Your suggestion seems sensible. I've added it as a community recommendation.

@wumpus
Copy link

wumpus commented Nov 21, 2024

I'd guess that Browsertrix or other browser-based tools already generate IPv6 traffic -- what does it do with these addresses? Also, wget?

@ato
Copy link
Member

ato commented Nov 21, 2024

wget calls inet_ntop which POSIX seems to only require produce "a text string suitable for presentation". It looks like glibc and musl's implementations would produce the canonical form.

The current version of browsertrix-crawler doesn't emit WARC-IP-Address. An older version I had lying around seemed to produce the canonical form.

Heritrix doesn't support IPv6. jwarc currently relies on Java's default which is the expanded old "preferred" form.

ato added a commit to iipc/jwarc that referenced this issue Nov 21, 2024
@sebastian-nagel
Copy link
Author

Thanks for the clarification!

I can confirm:

  • wget with glibc on Ubuntu 24.04 produces the canonical form
  • for Java (tested versions 11, 17, 21) a custom library is required to write the canonical form, for example Guava's InetAddresses.toAddrString()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants