Skip to content

Scrape twitter accounts for fine tuning + merging at high volume

Notifications You must be signed in to change notification settings

clydedevv/twitter-scraper-finetune

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Degen Scraper

Pipeline for generating AI character files and training datasets by scraping public figures' online presence across Twitter and blogs.

⚠️ IMPORTANT: Create a new Twitter account for this tool. DO NOT use your main account as it may trigger Twitter's automation detection and result in account restrictions.

Setup

  1. Install dependencies:

    npm install
  2. Copy the .env.example into a .env file:

    # (Required) Twitter Authentication
    TWITTER_USERNAME=     # your twitter username
    TWITTER_PASSWORD=     # your twitter password
    
    # (Optional) Blog Configuration
    BLOG_URLS_FILE=      # path to file containing blog URLs
    
    # (Optional) Scraping Configuration
    MAX_TWEETS=          # max tweets to scrape
    MAX_RETRIES=         # max retries for scraping
    RETRY_DELAY=         # delay between retries
    MIN_DELAY=           # minimum delay between requests
    MAX_DELAY=           # maximum delay between requests

Usage

Twitter Collection

npm run twitter -- username

Example: npm run twitter -- pmarca

Collection with date range

npm run twitter -- username --start-date 2025-01-01 --end-date 2025-01-31

Merge Characters

npm run merge-characters -- new-character-name character1 character2

Example: npm run merge-characters -- cobiedart cobie-2025-01-29 satsdart-2025-01-29

Blog Collection

npm run blog

Generate Character

npm run character -- username

Example: npm run character -- pmarca

Finetune

npm run finetune

Finetune (with test)

npm run finetune:test

Generate Virtuals Character Card

https://whitepaper.virtuals.io/developer-documents/agent-contribution/contribute-to-cognitive-core#character-card-and-goal-samples

Run this after Twitter Collection step

npm run generate-virtuals -- username date 

Example: npm run generate-virtuals -- pmarca 2024-11-29 Example without date: npm run generate-virtuals -- pmarca

The generated character file will be in the pipeline/[username]/[date]/character/character.json directory. The generated tweet dataset file will be in pipeline/[username]/[date]/raw/tweets.json.

Generate Merged Character

npm run generate-merged-virtuals -- username date

Example: npm run generate-merged-virtuals -- pmarca 2024-11-29

The generated merged character file will be in pipeline/[username]/[date]/character/merged_character.json directory. §

About

Scrape twitter accounts for fine tuning + merging at high volume

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%