Skip to content

Commit 0f991d1

Browse files
committed
Corrects pytest.ini minversion
1 parent b324358 commit 0f991d1

File tree

3 files changed

+156
-1
lines changed

3 files changed

+156
-1
lines changed

2024-06-10-backup-job-scrape.db

244 KB
Binary file not shown.

README-extra-notes.md

+155
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
# Job Vacancy Web Scraping Project
2+
3+
## Overview
4+
5+
Demo project to practice Python and Machine Learning technologies.
6+
7+
Project extracts job posting from jobs sites and analizes the data.
8+
9+
## Prerequisites
10+
11+
Python 3.6 or higher.
12+
13+
See `requirements.txt` in the folder.
14+
15+
## Getting Started (Linux/Unix instructions)
16+
17+
1. **Clone this repo from GitHub:**
18+
2. **Create a Python virtual in the folder that contains this repo:**
19+
20+
```bash
21+
python -m venv venv
22+
```
23+
24+
3. **Activate the virtual environment:**
25+
26+
```bash
27+
source venv/bin/activate
28+
```
29+
30+
4. **Install the project dependencies:**
31+
32+
```bash
33+
pip install --no-cache-dir --upgrade -r requirements.txt
34+
```
35+
36+
5. **Navigate to the application folder:**
37+
38+
```bash
39+
cd app
40+
```
41+
42+
6. **Set the PYTHONPATH environment variable:**
43+
44+
```bash
45+
export PYTHONPATH="./"
46+
```
47+
48+
7. **First time database setup: create SQLite database to store the jobs:**
49+
50+
```bash
51+
python scripts/create_database.py
52+
```
53+
54+
8. **Create a directory to store logs:**
55+
56+
```bash
57+
mkdir ../logs
58+
```
59+
60+
9. **Execute program to scrape jobs(first see "Set job search parameters"):**
61+
62+
```bash
63+
python scrape_jobs.py
64+
```
65+
66+
### Set job search parameters:
67+
68+
Currently only scraped site is [Jobserve](https://www.jobserve.com/). If you are interested in a particular set of jobs to store in the database you can populate the Jobserve Job Search form and perform your search. Then use the session Id (shid) that appear in the browser querystring to target these set. To do this, follow the steps:
69+
70+
1. Go to `https://www.jobserve.com/gb/en/Job-Search/` and set the values for your job search in the search form.
71+
2. After hitting Search button, you will redirected to a search results page.
72+
3. From the URL you can obtain the `session-id` value `https://www.jobserve.com/gb/en/JobSearch.aspx?shid=<session-id>`
73+
4. Use the `config-base.json` to populate the value of the `shid`
74+
75+
```json
76+
{
77+
"jobserve-shid": "<session-id>"
78+
}
79+
```
80+
81+
5. Rename this `config-base.json` to `config.json`
82+
83+
NOTE: After few days not accessing Jobserve with this `session-id`, it will expire and you will need to repopulate the search as explained in previous steps.
84+
85+
**Additional Notes**:
86+
87+
- Ensure you have Python 3.6 or higher installed on your system.
88+
- Run the unit test from the `app` folder executing `pytest`.
89+
- Check the logs in the `../logs` directory if you encounter any issues running the application.
90+
91+
Build the Docker Image
92+
93+
```
94+
docker build -t scrapejobs .
95+
```
96+
97+
Run the Docker Container
98+
99+
```
100+
docker run -p 8000:80 scrapejobs
101+
```
102+
103+
Inspect the application endpoints from the browser with
104+
105+
```
106+
http://127.0.0.1:8000/docs
107+
```
108+
109+
Execute on WSL or Docker container
110+
https://stackoverflow.com/questions/75012949/how-to-run-chrome-headless-in-docker-container-with-selenium
111+
112+
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
113+
sudo apt install ./google-chrome-stable_current_amd64.deb
114+
115+
CHROME_VERSION=$(google-chrome --version | cut -d ' ' -f 3 | cut -d '.' -f 1)
116+
For chromedriver-linux64.zip go to https://googlechromelabs.github.io/chrome-for-testing/
117+
and install same version you have in echo $CHROME_VERSION
118+
unzip chromedriver-linux64.zip
119+
go inside the folder where you unzipped /chromedriver-linux64
120+
and move the chromedriver as below
121+
sudo mv chromedriver /usr/bin/chromedriver
122+
sudo chmod +x /usr/bin/chromedriver
123+
124+
https://sites.google.com/chromium.org/driver/downloads
125+
If you are using Chrome version 115 or newer, please consult the Chrome for Testing availability dashboard. This page provides convenient JSON endpoints for specific ChromeDriver version downloading.
126+
127+
## Project Structure
128+
129+
- [Outline the project's file structure and explain the purpose of each file/directory]
130+
131+
## Data Source
132+
133+
[Specify the job posting site you chose for web scraping and provide a brief justification for your choice. If you encountered any anti-scraping filters, mention how you handled them.]
134+
135+
Scraping jobs advertisements from [JobServe](https://www.jobserve.com/) site.
136+
137+
## Data Processing
138+
139+
[Explain the steps involved in processing the scraped data, such as cleaning, transforming, or filtering the data.]
140+
141+
## Analysis and Visualization
142+
143+
[Describe the techniques and tools you used for analyzing and visualizing the data. You can include sample visualizations or screenshots in this section.]
144+
145+
## Findings and Recommendations
146+
147+
[Summarize your key findings and provide recommendations based on your analysis. This section should highlight the insights gained from the project and how they can help the recruitment agency achieve its objectives.]
148+
149+
## Limitations and Future Improvements
150+
151+
[Discuss any limitations or challenges you faced during the project and suggest potential improvements or future enhancements.]
152+
153+
## License
154+
155+
This project is licensed under the [MIT License](https://opensource.org/licenses/MIT).

pytest.ini

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[pytest]
2-
minversion = 8.0
2+
minversion = 7.0
33
addopts = -ra -q --capture=no --cov=. --durations=0
44
testpaths = app/tests
55
python_files = test_*.py

0 commit comments

Comments
 (0)