Skip to content

Commit 6394633

Browse files
committed
Merge branch 'main' into 6-scheduling
2 parents 3604267 + b65050e commit 6394633

15 files changed

+271
-163
lines changed

.github/workflows/test-docker.yml

-37
This file was deleted.

.github/workflows/test-k8s.yml

-47
This file was deleted.

.github/workflows/test-manifest.yml

-61
This file was deleted.

.github/workflows/test.yml

+154
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
name: Scapyd-k8s CI
2+
on:
3+
push:
4+
branches:
5+
- main
6+
pull_request:
7+
8+
jobs:
9+
test-unit:
10+
runs-on: ubuntu-latest
11+
steps:
12+
- name: Checkout
13+
uses: actions/checkout@v4
14+
15+
- name: Set up Python
16+
uses: actions/setup-python@v4
17+
with:
18+
python-version: 3.11
19+
cache: 'pip'
20+
21+
- name: Install dependencies
22+
run: |
23+
pip install -r requirements.txt
24+
pip install -r requirements-test.txt
25+
26+
- name: Run tests
27+
run: pytest -vv --color=yes scrapyd_k8s/tests/unit/
28+
29+
test-docker:
30+
runs-on: ubuntu-latest
31+
steps:
32+
- name: Checkout
33+
uses: actions/checkout@v4
34+
35+
- name: Set up Python
36+
uses: actions/setup-python@v4
37+
with:
38+
python-version: 3.11
39+
cache: 'pip'
40+
41+
- name: Install dependencies
42+
run: |
43+
pip install -r requirements.txt
44+
pip install -r requirements-test.txt
45+
46+
- name: Pull example spider
47+
run: docker pull ghcr.io/q-m/scrapyd-k8s-spider-example
48+
49+
- name: Run scrapyd-k8s
50+
run: |
51+
cp scrapyd_k8s.sample-docker.conf scrapyd_k8s.conf
52+
python -m scrapyd_k8s &
53+
while ! nc -q 1 localhost 6800 </dev/null; do sleep 1; done
54+
curl http://localhost:6800/daemonstatus.json
55+
56+
- name: Run tests
57+
run: pytest -vv --color=yes scrapyd_k8s/tests/integration/
58+
59+
test-manifest:
60+
container:
61+
runs-on: ubuntu-latest
62+
steps:
63+
- name: Checkout
64+
uses: actions/checkout@v4
65+
66+
- name: Set up Python
67+
uses: actions/setup-python@v4
68+
with:
69+
python-version: 3.11
70+
cache: 'pip'
71+
72+
- name: Install dependencies
73+
run: |
74+
pip install -r requirements.txt
75+
pip install -r requirements-test.txt
76+
77+
- name: Set up Docker Buildx
78+
uses: docker/setup-buildx-action@v3
79+
80+
- name: Build container
81+
uses: docker/build-push-action@v5
82+
with:
83+
context: .
84+
push: false
85+
load: true
86+
tags: test:latest
87+
cache-from: type=gha
88+
cache-to: type=gha,mode=max
89+
90+
- name: Start minikube
91+
uses: medyagh/setup-minikube@master
92+
93+
- name: Deploy to minikube
94+
run: |
95+
minikube image load test:latest
96+
# already pull image so we don't have to wait for it later
97+
minikube image pull ghcr.io/q-m/scrapyd-k8s-spider-example:latest
98+
# load manifest
99+
sed -i 's/\(imagePullPolicy:\s*\)\w\+/\1Never/' kubernetes.yaml
100+
sed -i 's/\(image:\s*\)ghcr\.io\/q-m\/scrapyd-k8s:/\1test:/' kubernetes.yaml
101+
sed -i 's/\(type:\s*\)ClusterIP/\1NodePort/' kubernetes.yaml
102+
kubectl create -f kubernetes.yaml
103+
# and wait for scrapyd-k8s to become ready
104+
kubectl wait --for=condition=Available deploy/scrapyd-k8s --timeout=60s
105+
curl --retry 10 --retry-delay 2 --retry-all-errors `minikube service scrapyd-k8s --url`/daemonstatus.json
106+
107+
- name: Run tests
108+
run: |
109+
TEST_WITH_K8S=1 \
110+
TEST_BASE_URL=`minikube service scrapyd-k8s --url` \
111+
TEST_MAX_WAIT=60 \
112+
TEST_AVAILABLE_VERSIONS=latest,`skopeo list-tags docker://ghcr.io/q-m/scrapyd-k8s-spider-example | jq -r '.Tags | map(select(. != "latest" and (startswith("sha-") | not))) | join(",")'` \
113+
pytest -vv --color=yes scrapyd_k8s/tests/integration/
114+
test-k8s:
115+
container:
116+
runs-on: ubuntu-latest
117+
steps:
118+
- name: Checkout
119+
uses: actions/checkout@v4
120+
121+
- name: Set up Python
122+
uses: actions/setup-python@v4
123+
with:
124+
python-version: 3.11
125+
cache: 'pip'
126+
127+
- name: Install dependencies
128+
run: |
129+
pip install -r requirements.txt
130+
pip install -r requirements-test.txt
131+
132+
- name: Start minikube
133+
uses: medyagh/setup-minikube@master
134+
135+
- name: Prepare Kubernetes environment
136+
run: |
137+
kubectl create secret generic example-env-secret --from-literal=FOO_1=bar
138+
kubectl create configmap example-env-configmap --from-literal=FOO_2=baz
139+
# already pull image so we don't have to wait for it later
140+
minikube image pull ghcr.io/q-m/scrapyd-k8s-spider-example:latest
141+
142+
- name: Run scrapyd-k8s
143+
run: |
144+
cp scrapyd_k8s.sample-k8s.conf scrapyd_k8s.conf
145+
python -m scrapyd_k8s &
146+
while ! nc -q 1 localhost 6800 </dev/null; do sleep 1; done
147+
curl http://localhost:6800/daemonstatus.json
148+
149+
- name: Run tests
150+
run: |
151+
TEST_WITH_K8S=1 \
152+
TEST_MAX_WAIT=60 \
153+
TEST_AVAILABLE_VERSIONS=latest,`skopeo list-tags docker://ghcr.io/q-m/scrapyd-k8s-spider-example | jq -r '.Tags | map(select(. != "latest" and (startswith("sha-") | not))) | join(",")'` \
154+
pytest -vv --color=yes scrapyd_k8s/tests/integration/

CONFIG.md

+60-5
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
1-
## About
2-
This file provides you with the detailed description of parameters listed in the config file, and explaining why they are used
3-
and when you are expected to provide or change them.
1+
# scrapyd-k8s configuration
42

5-
## Configuration file
3+
scrapyd-k8s is configured with the file `scrapyd_k8s.conf`. The file format is meant to
4+
stick to [scrapyd's configuration](https://scrapyd.readthedocs.io/en/latest/config.html) where possible.
5+
6+
## `[scrapyd]` section
67

78
* `http_port` - defaults to `6800` ([](https://scrapyd.readthedocs.io/en/latest/config.html#http-port))
89
* `bind_address` - defaults to `127.0.0.1` ([](https://scrapyd.readthedocs.io/en/latest/config.html#bind-address))
@@ -35,4 +36,58 @@ exponentially and is calculated as `backoff_time *= self.backoff_coefficient`.
3536

3637
### When do I need to change it in the config file?
3738
Default values for these parameters are provided in the code and are tuned to an "average" cluster setting. If your network
38-
requirements or other conditions are unusual, you may need to adjust these values to better suit your specific setup.
39+
requirements or other conditions are unusual, you may need to adjust these values to better suit your specific setup.
40+
41+
## project sections
42+
43+
Each project you want to be able to run, gets its own section, prefixed with `project.`. For example,
44+
consider an `example` spider, this would be defined in a `[project.example]` section.
45+
46+
* `repository` - container repository for the project, e.g. `ghcr.io/q-m/scrapyd-k8s-spider-example`
47+
48+
## Docker
49+
50+
This section describes Docker-specific options.
51+
See [`scrapyd_k8s.sample-docker.conf`](scrapyd_k8s.sample-docker.conf) for an example.
52+
53+
* `[scrapyd]` `launcher` - set this to `scrapyd_k8s.launcher.Docker`
54+
* `[scrapyd]` `repository` - choose between `scrapyd_k8s.repository.Local` and `scrapyd_k8s.repository.Remote`
55+
56+
TODO: explain `Local` and `Remote` repository, and how to use them
57+
58+
## Kubernetes
59+
60+
This section describes Kubernetes-specific options.
61+
See [`scrapyd_k8s.sample-k8s.conf`](scrapyd_k8s.sample-k8s.conf) for an example.
62+
63+
* `[scrapyd]` `launcher` - set this to `scrapyd_k8s.launcher.K8s`
64+
* `[scrapyd]` `repository` - set this to `scrapyd_k8s.repository.Remote`
65+
66+
For Kubernetes, it is important to set resource limits.
67+
68+
TODO: explain how to set limits, with default, project and spider specificity.
69+
70+
71+
### Kubernetes API interaction
72+
73+
The Kubernetes event watcher is used in the code as part of the joblogs feature and is also utilized for limiting the
74+
number of jobs running in parallel on the cluster. Both features are not enabled by default and can be activated if you
75+
choose to use them.
76+
77+
The event watcher establishes a connection to the Kubernetes API and receives a stream of events from it. However, the
78+
nature of this long-lived connection is unstable; it can be interrupted by network issues, proxies configured to terminate
79+
long-lived connections, and other factors. For this reason, a mechanism was implemented to re-establish the long-lived
80+
connection to the Kubernetes API. To achieve this, three parameters were introduced: `reconnection_attempts`,
81+
`backoff_time` and `backoff_coefficient`.
82+
83+
#### What are these parameters about?
84+
85+
* `reconnection_attempts` - defines how many consecutive attempts will be made to reconnect if the connection fails;
86+
* `backoff_time`, `backoff_coefficient` - are used to gradually slow down each subsequent attempt to establish a
87+
connection with the Kubernetes API, preventing the API from becoming overloaded with requests.
88+
The `backoff_time` increases exponentially and is calculated as `backoff_time *= self.backoff_coefficient`.
89+
90+
#### When do I need to change it in the config file?
91+
92+
Default values for these parameters are provided in the code and are tuned to an "average" cluster setting. If your network
93+
requirements or other conditions are unusual, you may need to adjust these values to better suit your specific setup.

0 commit comments

Comments
 (0)