Skip to content

HasithaKutala/Web-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Web-Scraping

What is Web Scraping?

Web scraping is an automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in HTML format, which is then converted into structured data in a spreadsheet.

Web scraping requires two parts, namely the crawler and the scraper. The crawler is an artificial intelligence algorithm that browses the web to search for the particular data required by following the links across the internet. The scraper, on the other hand, is a specific tool created to extract data from the website.

How does it work?

When you run the code for web scraping, a request is sent to the URL that you have mentioned. As a response to the request, the server sends the data and allows you to read the HTML or XML page. The code then, parses the HTML or XML page, finds the data and extracts it.

Libraries

  • BeautifulSoup BeautifulSoup is one of the most helpful Python web scraping libraries for parsing HTML and XML documents into a tree structure to identify and extract data.
  • Scrapy Scrapy is a web crawling and screen scraping library to quickly and efficiently crawl websites and extract structured data from their pages. we can use it as monitoring, automated testing, and data mining.
  • Selenium Selenium is a web testing library. It is used to automate browser activities.. You must first create functional test cases using the Selenium web driver before you can begin working on Selenium with Python
  • Requests It generate multiple HTTP requests

About

Web scraping

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages