-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate using GraphQL as our backend #5
Comments
We really need to do this, my token just got rate limited! |
I looked into this in the mean time, and there doesn't seem to be a good way to get multiple releases in a single request. However, it is my first time using GraphQL, so I might just be overlooking something. |
Yes, that definitely helped, I haven't had a rate limiting issue since then. |
I also struggled first to figure out how to do multiple queries in one request with graphql, but I finally figured it out:
Both
Here is a full python snippet: from typing import Optional, Dict, Any
import urllib.parse
import urllib.request
import json
import sys
import os
class GithubClient:
def __init__(self, api_token: Optional[str]) -> None:
self.api_token = api_token
def _request(
self, path: str, method: str, data: Optional[Dict[str, Any]] = None
) -> Any:
url = urllib.parse.urljoin("https://api.github.com/", path)
headers = {"Content-Type": "application/json"}
if self.api_token:
headers["Authorization"] = f"token {self.api_token}"
body = None
if data:
body = json.dumps(data).encode("ascii")
req = urllib.request.Request(url, headers=headers, method=method, data=body)
resp = urllib.request.urlopen(req)
return json.loads(resp.read())
def post(self, path: str, data: Dict[str, str]) -> Any:
return self._request(path, "POST", data)
def graphql(self, query: str) -> Dict[str, Any]:
resp = self.post("/graphql", data=dict(query=query))
if "errors" in resp:
raise RuntimeError(f"Expected data from graphql api, got: {resp}")
data: Dict[str, Any] = resp["data"]
return data
query = """
{
a: repository(name: "nur-packages", owner: "Mic92") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 1) { edges { node { oid } } }
}
}
}
}
b: repository(name: "nur-packages", owner: "balsoft") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 1) { edges { node { oid } } }
}
}
}
}
}
"""
token = os.environ.get("GITHUB_TOKEN")
if not token:
print("GITHUB_TOKEN not set")
sys.exit(1)
client = GithubClient(api_token=token)
d = client.graphql(query)
print(d) |
Is this repo still used? I was under the impression that this functionality had been ported to Haskell and merged into the main nixpkgs-update repo, but it's not? |
Oh, cool! I guess it's just low-maintenance code then... I've looked into using the GraphQL API, and it definitely seems to be an improvement: we can easily hit multiple repos with a single "request point" (I think every repo only counts for a single request, so we should be able to hit 100 repo's per request point). For reference, here is the query I used: {
a: repository(owner: "microsoft", name: "vscode") {
...releaseInfo
}
b: repository(owner: "junegunn", name: "fzf") {
...releaseInfo
}
c: repository(owner: "foobar", name: "arsadf") {
...releaseInfo
}
d: repository(owner: "jgm", name: "pandoc") {
...releaseInfo
}
e: repository(owner: "swaywm", name: "sway") {
...releaseInfo
}
f: repository(owner: "sagemath", name: "sage") {
...releaseInfo
}
rateLimit {
limit
cost
remaining
resetAt
}
}
fragment releaseInfo on Repository {
releases(first: 10) {
nodes {
tagName
isPrerelease
isDraft
publishedAt
}
}
} Perhaps we'll even be able to do a full query of all repo's in one shot, but I think some batching is still in order |
You would probably run into some timeout eventually but a higher batch size would be definitely more efficient than scraping each repo individually. |
It seems like we should be able to request all releases in one go (or in batches) instead of in one request per repo.
This should be much faster.
The text was updated successfully, but these errors were encountered: