-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to scrape any stats: 404 #19
Comments
I'll take a peek at this today. It's possible that the endpoint has changed. |
Did you find the cause of this? I´m having the exact same problem. |
The root cause is that the main landing page for User/Publication stats has changed. Medium has (rightly) moved the original query there, that I had previously had to hack around, to GraphQL like the rest of the frontend generating queries. This is all to say that the CLI and main entrypoints are very broken, however some of the Python client code will still work. This is an example script that I think is somewhat in keeping with the behavior of the previously working CLI (for publications; pagination not implemented): import argparse
import json
from datetime import datetime
from pathlib import Path
from medium_stats.scraper import StatGrabberPublication
def build_summary_stats_payload(publication_id: str):
return {
"operationName": "PublicationLifetimeStoryStatsPostsQuery",
"variables": {
"id": publication_id,
"first": 50,
"after": "",
"orderBy": {"publishedAt": "DESC"},
"filter": {"published": True},
},
"query": """
query PublicationLifetimeStoryStatsPostsQuery($id: String!, $first: Int!, $after: String!, $orderBy: PublicationPostsOrderBy, $filter: PublicationPostsFilter) {
publication(id: $id) {
id
publicationPostsConnection(
first: $first
after: $after
orderBy: $orderBy
filter: $filter
) {
__typename
edges {
...PublicationLifetimeStoryStats_relayPublicationPostEdge
__typename
}
pageInfo {
endCursor
hasNextPage
__typename
}
}
__typename
}
}
fragment PublicationLifetimeStoryStats_relayPublicationPostEdge on RelayPublicationPostEdge {
listedAt
node {
id
...LifetimeStoryStats_post
__typename
}
__typename
}
fragment LifetimeStoryStats_post on Post {
id
...StoryStatsTable_post
...MobileStoryStatsTable_post
__typename
}
fragment StoryStatsTable_post on Post {
...StoryStatsTableRow_post
__typename
id
}
fragment StoryStatsTableRow_post on Post {
id
firstBoostedAt
isLocked
totalStats {
views
reads
__typename
}
earnings {
total {
currencyCode
nanos
units
__typename
}
__typename
}
...TablePostInfos_post
...usePostStatsUrl_post
__typename
}
fragment TablePostInfos_post on Post {
id
title
readingTime
isLocked
visibility
...usePostUrl_post
...Star_post
...PostPreviewByLine_post
__typename
}
fragment usePostUrl_post on Post {
id
creator {
...userUrl_user
__typename
id
}
collection {
id
domain
slug
__typename
}
isSeries
mediumUrl
sequence {
slug
__typename
}
uniqueSlug
__typename
}
fragment userUrl_user on User {
__typename
id
customDomainState {
live {
domain
__typename
}
__typename
}
hasSubdomain
username
}
fragment Star_post on Post {
id
creator {
id
__typename
}
__typename
}
fragment PostPreviewByLine_post on Post {
id
creator {
...PostPreviewByLine_user
__typename
id
}
collection {
...PostPreviewByLine_collection
__typename
id
}
...CardByline_post
__typename
}
fragment PostPreviewByLine_user on User {
id
__typename
...CardByline_user
...ExpandablePostByline_user
}
fragment CardByline_user on User {
__typename
id
name
username
mediumMemberAt
socialStats {
followerCount
__typename
}
...useIsVerifiedBookAuthor_user
...userUrl_user
...UserMentionTooltip_user
}
fragment useIsVerifiedBookAuthor_user on User {
verifications {
isBookAuthor
__typename
}
__typename
id
}
fragment UserMentionTooltip_user on User {
id
name
username
bio
imageId
mediumMemberAt
membership {
tier
__typename
id
}
...UserAvatar_user
...UserFollowButton_user
...useIsVerifiedBookAuthor_user
__typename
}
fragment UserAvatar_user on User {
__typename
id
imageId
mediumMemberAt
membership {
tier
__typename
id
}
name
username
...userUrl_user
}
fragment UserFollowButton_user on User {
...UserFollowButtonSignedIn_user
...UserFollowButtonSignedOut_user
__typename
id
}
fragment UserFollowButtonSignedIn_user on User {
id
name
__typename
}
fragment UserFollowButtonSignedOut_user on User {
id
...SusiClickable_user
__typename
}
fragment SusiClickable_user on User {
...SusiContainer_user
__typename
id
}
fragment SusiContainer_user on User {
...SignInOptions_user
...SignUpOptions_user
__typename
id
}
fragment SignInOptions_user on User {
id
name
__typename
}
fragment SignUpOptions_user on User {
id
name
__typename
}
fragment ExpandablePostByline_user on User {
__typename
id
name
imageId
...userUrl_user
...useIsVerifiedBookAuthor_user
}
fragment PostPreviewByLine_collection on Collection {
id
__typename
...CardByline_collection
...CollectionLinkWithPopover_collection
}
fragment CardByline_collection on Collection {
name
...collectionUrl_collection
__typename
id
}
fragment collectionUrl_collection on Collection {
id
domain
slug
__typename
}
fragment CollectionLinkWithPopover_collection on Collection {
...collectionUrl_collection
...CollectionTooltip_collection
__typename
id
}
fragment CollectionTooltip_collection on Collection {
id
name
slug
description
subscriberCount
customStyleSheet {
header {
backgroundImage {
id
__typename
}
__typename
}
__typename
id
}
...CollectionAvatar_collection
...CollectionFollowButton_collection
__typename
}
fragment CollectionAvatar_collection on Collection {
name
avatar {
id
__typename
}
...collectionUrl_collection
__typename
id
}
fragment CollectionFollowButton_collection on Collection {
__typename
id
name
slug
...collectionUrl_collection
...SusiClickable_collection
}
fragment SusiClickable_collection on Collection {
...SusiContainer_collection
__typename
id
}
fragment SusiContainer_collection on Collection {
name
...SignInOptions_collection
...SignUpOptions_collection
__typename
id
}
fragment SignInOptions_collection on Collection {
id
name
__typename
}
fragment SignUpOptions_collection on Collection {
id
name
__typename
}
fragment CardByline_post on Post {
...DraftStatus_post
...Star_post
...shouldShowPublishedInStatus_post
__typename
id
}
fragment DraftStatus_post on Post {
id
pendingCollection {
id
creator {
id
__typename
}
...BoldCollectionName_collection
__typename
}
statusForCollection
creator {
id
__typename
}
isPublished
__typename
}
fragment BoldCollectionName_collection on Collection {
id
name
__typename
}
fragment shouldShowPublishedInStatus_post on Post {
statusForCollection
isPublished
__typename
id
}
fragment usePostStatsUrl_post on Post {
id
creator {
id
username
__typename
}
__typename
}
fragment MobileStoryStatsTable_post on Post {
id
firstBoostedAt
isLocked
totalStats {
reads
views
__typename
}
earnings {
total {
currencyCode
nanos
units
__typename
}
__typename
}
...TablePostInfos_post
...usePostStatsUrl_post
__typename
}
""",
}
def get_article_ids(data: dict) -> list[str]:
articles = data["data"]["publication"]["publicationPostsConnection"]["edges"]
return [article["node"]["id"] for article in articles]
def create_parser():
parser = argparse.ArgumentParser()
parser.add_argument("--slug", required=True)
parser.add_argument("--sid", required=True)
parser.add_argument("--uid", required=True)
parser.add_argument("--start", required=True)
parser.add_argument("--stop", required=True)
parser.add_argument("--output-dir", required=True)
return parser
def write_json(data: dict, path: Path):
json_str = json.dumps(data, indent=4)
path.write_text(json_str)
def main():
parser = create_parser()
args = parser.parse_args()
start = datetime.fromisoformat(args.start)
stop = datetime.fromisoformat(args.stop)
base_export_path = Path(args.output_dir) / "stats_exports" / args.slug
base_export_path.mkdir(parents=True, exist_ok=True)
pub = StatGrabberPublication(slug=args.slug, sid=args.sid, uid=args.uid, start=start, stop=stop)
# get publication views & visitors (like the stats landing page)
views = pub.get_events(type_="views")
visitors = pub.get_events(type_="visitors")
write_json(views, base_export_path / "views.json")
write_json(visitors, base_export_path / "visitors.json")
# get summary stats for all publication articles
gql_endpoint = "https://medium.com/_/graphql"
payload = build_summary_stats_payload(pub.id)
response = pub.session.post(gql_endpoint, json=payload)
response.raise_for_status()
summary_stats = response.json()
write_json(summary_stats, base_export_path / "summary_stats.json")
# get individual article statistics
articles = get_article_ids(summary_stats)
article_events = pub.get_all_story_stats(articles)
write_json(article_events, base_export_path / "article_events.json")
referrers = pub.get_all_story_stats(articles, type_="referrer")
write_json(referrers, base_export_path / "referrers.json")
if __name__ == "__main__":
main() And then you can call it like: python example.py --slug $MEDIUM_SLUG --sid $MEDIUM_SID --uid $MEDIUM_UID --start 2020-01-01 --stop 2024-07-01 --output-dir ./ Let me know if this suffices at the moment as a workaround, because with this general structure change to the Medium page, I'm inclined to say that this whole library needs a rewrite, which has been overdue for a long time. Realistically, this library just needs to be a wrapper client over the GraphQL API. |
This is great. Thanks so much for your efforts in providing it. The json structure is different, but most of the info seems to be there. 👍 |
@otosky, sorry to bug you with this question. There used to be a Could you point me in the right direction? EDIT: Never mind I think I found it at: |
On a side note @otosky Food for thought. |
Issue
Until a week ago scraping publication stats worked.
Suddenly, last week, it stopped working.
Command:
The error:
JSON is expected, but not returned.
Expected result
medium_stats would output the stats to
./stats_export/<publication>
Debugging steps
Anyone else running into issues, got a workaround? Or is Medium updating its stats pages?
The text was updated successfully, but these errors were encountered: