Web-Scraping

What is Web Scraping?

Web scraping is an automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in HTML format, which is then converted into structured data in a spreadsheet.

Web scraping requires two parts, namely the crawler and the scraper. The crawler is an artificial intelligence algorithm that browses the web to search for the particular data required by following the links across the internet. The scraper, on the other hand, is a specific tool created to extract data from the website.

How does it work?

When you run the code for web scraping, a request is sent to the URL that you have mentioned. As a response to the request, the server sends the data and allows you to read the HTML or XML page. The code then, parses the HTML or XML page, finds the data and extracts it.

Libraries

BeautifulSoup BeautifulSoup is one of the most helpful Python web scraping libraries for parsing HTML and XML documents into a tree structure to identify and extract data.
Scrapy Scrapy is a web crawling and screen scraping library to quickly and efficiently crawl websites and extract structured data from their pages. we can use it as monitoring, automated testing, and data mining.
Selenium Selenium is a web testing library. It is used to automate browser activities.. You must first create functional test cases using the Selenium web driver before you can begin working on Selenium with Python
Requests It generate multiple HTTP requests

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-Scraping

About

Releases

Packages

Languages

HasithaKutala/Web-Scraping

Folders and files

Latest commit

History

Repository files navigation

Web-Scraping

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages