lubimy-czytac-scraper

A simple scraper to scrap data about the top 100 most popular books in recent months on the Polish site lubimyczytac.pl (https://lubimyczytac.pl/top100). Codes for both Selenium and Scrapy.

Codes for Scrapy are located in the scrapy_project folder and for Selenium in selenium_scraper.

Guide for running Scrapy scraper:

To run scrapy scraper you need enter a command line and create a scrapy project with the following command:
```
scrapy startproject myproject [project_dir]
```
in a directory of your choice.
Then go to the specified directory with a command:
```
cd project_dir 
```
Please download the project_scrapy.py file and place it in the folder "spiders" inside the directory from the previous step. There is only a single spider to extract all the information about top 100 books for all the available monhts from the lubimyczytac.pl site. Spiders name is "scrapy_books"
Run the spider with a command
```
scrapy crawl scrapy_books
```
If you want to store the results in a .csv file you can add -o file_name.csv:
```
scrapy crawl scrapy_books -o books.csv
```

A sample output can be found in the books.csv file.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
scrapy_project		scrapy_project
README.md		README.md
books.csv		books.csv
selenium_scraper.py		selenium_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lubimy-czytac-scraper

About

Releases

Packages

Languages

marcin-karlinski/lubimy-czytac-scraper

Folders and files

Latest commit

History

Repository files navigation

lubimy-czytac-scraper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages