Skip to content

scalable web scraper framework for finding documents on websites.

License

Notifications You must be signed in to change notification settings

thequbit/BarkingOwl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BarkingOwl

Join the chat at https://gitter.im/thequbit/BarkingOwl

BarkingOwl is a scalable web crawler intended to be used to find specific document types such as PDFs.

Not a hard-core hacker? Check out the web front-end tool for barkingowl here

####Background and Description####

Barking Owl came out of the need presented at a Hacks and Hackers Rochester (#hhroc) meet-up in Syracuse, NY. A journalist expressed his need for a tool that would assist him in looking for key words within PDFs posted to town websites, such as meeting minutes.

####Objective####

I wanted to make the code for this project as reusable as possible as I knew it had several parallels to other work I had been doing and wanted to do in the future. The solution was a architecture that would allow for significant scalability and extensibility.

####How to get started####

BarkingOwl is on the pypi network, thus it can be installed using pip:

> pip install barkingowl

To use BarkingOwl you will need to install RabbitMQ. Information on how to install RabbitMQ can be found here: http://www.rabbitmq.com/download.html

####Documentation####

Check out the wiki!

About

scalable web scraper framework for finding documents on websites.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages