Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Public Suffix API #676

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

mckenfra
Copy link

This formalizes #231 into a concrete proposal.

@Rob--W Rob--W requested review from oliverdunk and xeenon August 20, 2024 12:29
Copy link
Member

@oliverdunk oliverdunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I will reach out to the PSL maintainers I have been in contact with to ask them to take a look. I'll also share this internally to get an overall opinion from Chrome.

@mckenfra mckenfra changed the title Add Public Suffix API proposal Proposal: Public Suffix API Aug 23, 2024
meets any of the following criteria:
* Contains a character that is invalid in an Internationalized Domain Name (IDN) - e.g. symbols, whitespace
* Is an IP address - IPv4 or IPv6
* Is a public suffix itself - including the case of it being a single-label suffix not explicitly matched in the PSL

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to think about the callers a bit more to determine what they want here.
If we assume the caller is something like isKnownPublicSuffix(...) then they would want a "true" for IPs and things that are already a public suffix but a false for empty labels or the root label. If we assume the caller is something like getHighlightParts(...) then an IP should probably look like a registrable domain.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the API in c217156: getRegistrableDomain() is now a proxy for hasKnownPublicSuffix(), because it returns null in cases where a domain is valid but has an unknown eTLD (i.e. it no longer defaults to assuming a single-label eTLD). So if the promise returned by getRegistrableDomain() fufills with a nonnull value, then it can be inferred that the domain has an eTLD in the PSL.

The API does not currently have a getHighlightParts() method, but we could possibly add it in the Future Work section if we can identify an appropriate use case.

Copy link

@simon-friedberger simon-friedberger Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I was sufficiently clear here.

My point is there should be an API such that if I want to highlight the different URL parts in the address bar or find out what the current "site" is for applying some setting there needs to be an API that takes.

docs.google.com/... -> google.com - ICANN
foo.github.io -> foo.github.io - PRIVATE
10.0.0.1/login -> 10.0.0.1 - even though this is an IP
avocado.banana -> avocado.banana - even though this is not a valid TLD, because it can be made to resolve locally

On the other hand, if the extension is trying to determine if something is a domain or a search term we probably want
docs.google.com/... -> google.com - ICANN
foo.github.io -> foo.github.io - PRIVATE
10.0.0.1/login -> 10.0.0.1 - even though this is an IP
avocado.banana -> ERROR/searchTerm

I think having different functions to call depending on what people want will probably be a nicer API but just returning a tuple and adding additional information about the result like type:RegistrableDomain|IPAddress|UnknownTLD would also be an option.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I can see how that could be useful. Is it something we could address in a future version of this API?

Copy link

@simon-friedberger simon-friedberger Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like a lot of effort to me so I would say let's make sure to include it in the first version. (Otherwise people will always use the old version because "what if the new version isn't supported yet?")

Copy link

@simon-friedberger simon-friedberger Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, for not responding! I'm not 100% sure what you mean here but this does not cover IPV6 addresses. And how should this treat domains which are not registrable but intended to be suffixes like home.arpa? And what does getRegistrableDomain return on things which are not registrable but by the "single label rule" count as a TLD, like feeding it "localhost"?

Should we maybe just start prototyping this somewhere and write a bunch of tests for edge cases and see, for example, how it maps to the use in common extensions and the internal use of https://searchfox.org/mozilla-central/source/netwerk/dns/nsIEffectiveTLDService.idl in Firefox?

(I think you already did that!)

Copy link
Author

@mckenfra mckenfra Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I showed this example code is to demonstrate:

  1. Extensions have everything they need using the existing API in this proposal to cover your use cases.

    They still need to write their own code to pre-parse for IP addresses, but this is a more general issue not specific to this API, since browsers do not currently offer anything built-in to do this for developers. For example, here is a webpage explaining how to parse IP addresses with javascript.

  2. Adding functionality to this API to determine if a string is an IP address (whichever type) may be making it more general-purpose than it should be.

    Perhaps it should be left up to the developer to filter out IP addresses before calling this API. Alternatively, functionality to parse IP addresses could be offered in a different API. It would certainly be useful, as this stackoverflow question and this stackoverflow question show.

In answer to your questions:

this does not cover IPV6 addresses

It was not meant to, the regex I gave was to point out that it would be up to extension developers to come up with a solution for pre-parsing IP addresses before calling this API.

how should this treat domains which are not registrable but intended to be suffixes like home.arpa

The API returns null if a string is a domain with an unknown suffix. This is used in my code example here:

  // Use case 1: what is the curent site?
  const currentSite = await getIPAddressOrRegistrableDomain(domain) || domain;

In this code snippet, in the event the API returns null, the code substitutes the original domain string instead - i.e. it assumes it is an unknown local domain. (This is simplified code: the extension developer may want to do extra work to get the last label from the domain string instead of just using the string as-is.)

And what does getRegistrableDomain return on things which are not registrable but by the "single label rule" count as a TLD, like feeding it "localhost"?

An earlier version of this proposal stated that the "single-label rule" should apply in the case of domains with unknown suffixes. However, following reviewer feedback, this assumption was removed. Instead, where a domain has an unknown suffix, the API returns null. It is then left to the extension developer to decide what to do, e.g. to assume a single-label suffix.

Should we maybe just start prototyping this somewhere and write a bunch of tests for edge cases and see, for example, how it maps to the use in common extensions and the internal use of https://searchfox.org/mozilla-central/source/netwerk/dns/nsIEffectiveTLDService.idl in Firefox?

Yes, use cases identified in common extensions are set out in this proposal. Perhaps you could review the table in section "7. Summary of behaviours" in this proposal, since it may help demonstrate how the API handles the various edge cases?

Copy link

@simon-friedberger simon-friedberger Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, that the suggested API can be used to build what is necessary.

But on the other hand, this is a browser API it will very often get a "location" and have to figure out what the "base domain" is. So why would we make each developer sort out the edge cases themselves again? I would prefer an API which is more aligned with what developers need.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could suggest the API method you have in mind? (I.e. method name, parameter(s), return type)

Copy link
Author

@mckenfra mckenfra Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current proposal, an error is thrown if the value passed to getRegistrableDomain(value) is invalid, e.g. if it contains invalid characters or is an IP address.

One way of supporting your use case would be to make it possible to determine if the error was thrown due to the value being an IP address. We could add a requirement that the cause property of the error should be set to { code: "IPAddress" }.

Then, my earlier example code to handle your use cases would be updated as follows:

async function getIPAddressOrRegistrableDomain(domain) {
  try {
    return await publicSuffix.getRegistrableDomain(domain);
  } catch (e) {
    if (e.cause && e.cause.code === "IPAddress") {
      return domain; // It's an IP address
    }
    throw e; // Not an IP address, so rethrow the error
  }
}

try {

  // Use case 1: what is the curent site?
  const currentSite = await getIPAddressOrRegistrableDomain(domain) || domain;

  // Use case 2: is this a known domain, or should I search?
  const isDomain = await getIPAddressOrRegistrableDomain(domain) !== null;

} catch (e) {
  // Invalid domain
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants