GitHub - oddFEELING/web-scrapper: A front facing app to scrape sites

web-scrapper

🤖 Manual web scraper for pages with hot reload and Pagination

📝 Table of Contents

About
Demo / Working
How it works
Usage
All Options
Built Using
Contributing
Author

🧐 About

Web crawler for scraping web pages

🎥 Demo / Working

💭 How it works

An Object Search Object is initialized which sets a level of needed criterias for the scrape to be processed and also for the files to be saved as CSV and JSON.

Async function to handle scrape is called drawing arguments from the Search Object,then the browser(chromium) is is launched and navigates to the given url.

If the Pages to be scraped exceeds 1 page then puppeteer emulates clicks to navigate the pages and store new scapped data in an array of before returning

The entire Scrapper is written in JavaScript

🎈 Usage

Clone repository and install packages:

npm install

Navigate to the Search_Object object and enter initial values

Example

const Search__Object = {
  data__source: 'oddFEELING PortFolio',
  source__url: 'https://odd-portfolio.web.app/',
  total__pages: 2,
  Scrape__cli: 'false',
  JSON__name: 'Links__json',
  JSON__path: path.resolve(__dirname, './Scrapped__Data/JSON__files'),
  CSV__name: 'Links__csv',
  CSV__path: path.resolve(__dirname, './Scrapped__Data/CSV__files'),
};

Naviget to the item selector to input an element selector

//-->  select main query element
let items = document.querySelectorAll(`## Selector`);

replace ## Selector with a selector e.g div.sc-fKFxtB ivoVis > h3

🚩 This selects all `h3` in any `div` element with the classname of `sc-fKFxtB ivoVis`

Navigate to the loop that pushes an object to the finl result then enter the values that should be extracted from the element. The attributes are taken from the item object

Example

//-->  loop through items and add to result
items.forEach((item) => {
  results.push({
    source: `oddFEELING portfolio`, //-->  ##Source
    url: item.getAttribute('href'),
    content: item.textContent,
  });
});

🚩 This gets the `href` of the element and `textContent`

If the site that is to be scraped is paginated (makes an API call and renders items into different pages) Puppeteer would need to auto click and navigate the page. Specify the element to click by setting its selector in the page.click(## Paginator)

//-->  puppeteer auto click next button (pagination)
if (currentPage < pagesToScrape) {
  await Promise.all([
    await page.click(`div.sc-fKFxtB ivoVis > h3`),
    await page.waitForSelector(`## selector`),
  ]);
}

THen set an element to wait for before continuing the scrape process in the page.waitForSelector( ## selector)

//-->  puppeteer auto click next button (pagination)
if (currentPage < pagesToScrape) {
  await Promise.all([
    await page.click(`div.sc-fKFxtB ivoVis > h3`),
    await page.waitForSelector(`div`),
  ]);
}

🚩 This waits for all `divs` to render before continuing

Your're set!!.

Navigate to your terminal or press cntrl + shift + ~ then run code using

node scraper.js

📦 Options

Options to be filled

## Selector --> element to query

## Paginator --> element to click for page integration

## source --> source of data

$$ Name --> Names of file given to written files

$$ boolean --> true or false

⛏️ Built Using

NODE - Yep! Node...... not python
PUPPETEER - Headless 3rd party browser emulator
CONVERT-ARRAY-TO-CSV - Npm package

✍️ Author

@oddFEELING - Author and Owner

See also the list of contributors who participated in this project.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Scrapped__Data		Scrapped__Data
assets		assets
.gitignore		.gitignore
README.md		README.md
Scraper.js		Scraper.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

web-scrapper

📝 Table of Contents

🧐 About

🎥 Demo / Working

💭 How it works

🎈 Usage

Example

Example

📦 Options

⛏️ Built Using

✍️ Author

About

Releases

Packages

Languages

oddFEELING/web-scrapper

Folders and files

Latest commit

History

Repository files navigation

web-scrapper

📝 Table of Contents

🧐 About

🎥 Demo / Working

💭 How it works

🎈 Usage

Example

Example

📦 Options

⛏️ Built Using

✍️ Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages