Skip to content

Commit b7780ee

Browse files
authored
Add datagen Kubernetes guide (#123)
1 parent d302e28 commit b7780ee

File tree

4 files changed

+350
-6
lines changed

4 files changed

+350
-6
lines changed

examples/README.md

+7-6
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,12 @@
22

33
This directory contains end-to-end tutorials for the `datagen` tool.
44

5-
| Tutorial | Description |
6-
| -------- | ----------- |
7-
| [ecommerce](ecommerce) | A tutorial for the `datagen` tool that generates data for an ecommerce website. |
8-
| [docker-compose](docker-compose) | A `docker-compose` setup for the `datagen`. |
9-
| [blog](blog) | Sample data for a blog with users, posts, and comments. |
10-
| [webhook](webhook) | A tutorial for the `datagen` tool that generates data for a webhook. |
5+
| Tutorial | Description |
6+
| -------------------------------- | ------------------------------------------------------------------------------------------------ |
7+
| [ecommerce](ecommerce) | A tutorial for the `datagen` tool that generates data for an ecommerce website. |
8+
| [docker-compose](docker-compose) | A `docker-compose` setup for the `datagen`. |
9+
| [blog](blog) | Sample data for a blog with users, posts, and comments. |
10+
| [webhook](webhook) | A tutorial for the `datagen` tool that generates data for a webhook. |
11+
| [kubernetes](kubernetes) | A tutorial for the `datagen` tool that deploys to Kubernetes alongside a Redpanda Kafka cluster. |
1112

1213
To request a new tutorial, please [open an issue](https://github.com/MaterializeInc/datagen/issues/new?assignees=&labels=feature%2C+enhancement&template=feature_request.md&title=Feature%3A+).

examples/kubernetes/README.md

+221
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
# Kubernetes Example
2+
3+
This example demonstrates how to deploy the datagen tool to Kubernetes alongside a Redpanda Kafka cluster.
4+
5+
## Overview
6+
7+
The example includes:
8+
- A single-node Redpanda deployment for Kafka
9+
- A datagen deployment that produces data to Redpanda
10+
- ConfigMap to store the datagen schema
11+
- Associated Kubernetes services
12+
13+
## Prerequisites
14+
15+
- A Kubernetes cluster
16+
- `kubectl` configured to interact with your cluster
17+
- Basic understanding of Kubernetes concepts (Deployments, Services, ConfigMaps)
18+
19+
## Setup
20+
21+
1. First, create a namespace for our resources (if not already exists):
22+
23+
```bash
24+
kubectl create namespace materialize
25+
```
26+
27+
2. Apply the Kubernetes manifests, which will create the datagen and Redpanda deployments:
28+
29+
```bash
30+
kubectl apply -f examples/kubernetes/datagen.yaml
31+
kubectl apply -f examples/kubernetes/redpanda.yaml
32+
```
33+
34+
## Manifest Details
35+
36+
The deployment consists of several Kubernetes resources. Let's examine each one:
37+
38+
### 1. Schema ConfigMap
39+
40+
This ConfigMap stores the schema definition that datagen will use to generate data:
41+
42+
```yaml
43+
apiVersion: v1
44+
kind: ConfigMap
45+
metadata:
46+
name: datagen-schema
47+
namespace: materialize
48+
data:
49+
schema.json: |
50+
[
51+
{
52+
"_meta": {
53+
"topic": "mz_datagen_test"
54+
},
55+
"id": "iteration.index",
56+
"name": "faker.internet.userName()"
57+
}
58+
]
59+
```
60+
61+
You can customize the schema to generate different data. For more information, see the datagen [README](../../README.md) file.
62+
63+
### 2. Datagen Deployment
64+
65+
The datagen deployment uses the official `materialize/datagen` image and mounts the schema `ConfigMap`:
66+
67+
```yaml
68+
apiVersion: apps/v1
69+
kind: Deployment
70+
metadata:
71+
name: datagen
72+
namespace: materialize
73+
spec:
74+
replicas: 1
75+
selector:
76+
matchLabels:
77+
app: datagen
78+
template:
79+
metadata:
80+
labels:
81+
app: datagen
82+
spec:
83+
containers:
84+
- name: datagen
85+
image: materialize/datagen:latest
86+
args:
87+
[
88+
"datagen",
89+
"-s", "/schemas/schema.json",
90+
"-f", "json",
91+
"-n", "10024",
92+
"-w", "2000",
93+
"-d"
94+
]
95+
env:
96+
- name: KAFKA_BROKERS
97+
value: "redpanda.materialize.svc.cluster.local:9092"
98+
volumeMounts:
99+
- name: datagen-schema-volume
100+
mountPath: /schemas
101+
readOnly: true
102+
volumes:
103+
- name: datagen-schema-volume
104+
configMap:
105+
name: datagen-schema
106+
```
107+
108+
### 3. Redpanda Deployment and Service
109+
110+
The Redpanda deployment provides a Kafka-compatible message broker:
111+
112+
```yaml
113+
apiVersion: apps/v1
114+
kind: Deployment
115+
metadata:
116+
name: redpanda
117+
namespace: materialize
118+
spec:
119+
replicas: 1
120+
selector:
121+
matchLabels:
122+
app: redpanda
123+
template:
124+
metadata:
125+
labels:
126+
app: redpanda
127+
spec:
128+
containers:
129+
- name: redpanda
130+
image: docker.vectorized.io/vectorized/redpanda:v23.3.5
131+
command: ["/usr/bin/rpk"]
132+
args: [
133+
"redpanda",
134+
"start",
135+
"--overprovisioned",
136+
"--smp", "1",
137+
"--memory", "1G",
138+
"--reserve-memory", "0M",
139+
"--node-id", "0",
140+
"--check=false",
141+
"--kafka-addr", "0.0.0.0:9092",
142+
"--advertise-kafka-addr", "redpanda.materialize.svc.cluster.local:9092",
143+
"--pandaproxy-addr", "0.0.0.0:8082",
144+
"--advertise-pandaproxy-addr", "redpanda.materialize.svc.cluster.local:8082",
145+
"--set", "redpanda.enable_transactions=true",
146+
"--set", "redpanda.enable_idempotence=true",
147+
"--set", "redpanda.auto_create_topics_enabled=true",
148+
"--set", "redpanda.default_topic_partitions=1"
149+
]
150+
ports:
151+
- containerPort: 9092
152+
- containerPort: 8081
153+
- containerPort: 8082
154+
livenessProbe:
155+
httpGet:
156+
path: /v1/status/ready
157+
port: 9644
158+
initialDelaySeconds: 30
159+
periodSeconds: 10
160+
---
161+
apiVersion: v1
162+
kind: Service
163+
metadata:
164+
name: redpanda
165+
namespace: materialize
166+
spec:
167+
selector:
168+
app: redpanda
169+
ports:
170+
- name: kafka
171+
protocol: TCP
172+
port: 9092
173+
targetPort: 9092
174+
- name: pandaproxy
175+
protocol: TCP
176+
port: 8082
177+
targetPort: 8082
178+
```
179+
180+
## Verifying the Deployment
181+
182+
1. Check if the pods are running:
183+
184+
```bash
185+
kubectl get pods -n materialize
186+
```
187+
188+
2. View datagen logs:
189+
190+
```bash
191+
kubectl logs -f deployment/datagen -n materialize
192+
```
193+
194+
3. View Redpanda logs:
195+
196+
```bash
197+
kubectl logs -f deployment/redpanda -n materialize
198+
```
199+
200+
## Scaling
201+
202+
You can scale the datagen deployment to produce more data in parallel:
203+
204+
```bash
205+
kubectl scale deployment datagen -n materialize --replicas=3
206+
```
207+
208+
## Cleanup
209+
210+
To remove all resources:
211+
212+
```bash
213+
kubectl delete namespace materialize
214+
```
215+
216+
## Useful Links
217+
218+
- [Materialize documentation](https://materialize.com/docs/)
219+
- [Materialize community Slack](https://materialize.com/s/chat)
220+
- [Materialize Blog](https://materialize.com/blog/)
221+
- [Kubernetes documentation](https://kubernetes.io/docs/home/)

examples/kubernetes/datagen.yaml

+56
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
apiVersion: v1
2+
kind: ConfigMap
3+
metadata:
4+
name: datagen-schema
5+
namespace: materialize
6+
data:
7+
schema.json: |
8+
[
9+
{
10+
"_meta": {
11+
"topic": "mz_datagen_test"
12+
},
13+
"id": "iteration.index",
14+
"name": "faker.internet.userName()"
15+
}
16+
]
17+
18+
---
19+
apiVersion: apps/v1
20+
kind: Deployment
21+
metadata:
22+
name: datagen
23+
namespace: materialize
24+
spec:
25+
replicas: 1
26+
selector:
27+
matchLabels:
28+
app: datagen
29+
template:
30+
metadata:
31+
labels:
32+
app: datagen
33+
spec:
34+
containers:
35+
- name: datagen
36+
image: materialize/datagen:latest
37+
args:
38+
[
39+
"datagen",
40+
"-s", "/schemas/schema.json",
41+
"-f", "json",
42+
"-n", "10024",
43+
"-w", "2000",
44+
"-d"
45+
]
46+
env:
47+
- name: KAFKA_BROKERS
48+
value: "redpanda.materialize.svc.cluster.local:9092"
49+
volumeMounts:
50+
- name: datagen-schema-volume
51+
mountPath: /schemas
52+
readOnly: true
53+
volumes:
54+
- name: datagen-schema-volume
55+
configMap:
56+
name: datagen-schema

examples/kubernetes/redpanda.yaml

+66
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: redpanda
5+
namespace: materialize
6+
spec:
7+
replicas: 1
8+
selector:
9+
matchLabels:
10+
app: redpanda
11+
template:
12+
metadata:
13+
labels:
14+
app: redpanda
15+
spec:
16+
containers:
17+
- name: redpanda
18+
image: docker.vectorized.io/vectorized/redpanda:v23.3.5
19+
command: ["/usr/bin/rpk"]
20+
args: [
21+
"redpanda",
22+
"start",
23+
"--overprovisioned",
24+
"--smp", "1",
25+
"--memory", "1G",
26+
"--reserve-memory", "0M",
27+
"--node-id", "0",
28+
"--check=false",
29+
"--kafka-addr", "0.0.0.0:9092",
30+
"--advertise-kafka-addr", "redpanda.materialize.svc.cluster.local:9092",
31+
"--pandaproxy-addr", "0.0.0.0:8082",
32+
"--advertise-pandaproxy-addr", "redpanda.materialize.svc.cluster.local:8082",
33+
"--set", "redpanda.enable_transactions=true",
34+
"--set", "redpanda.enable_idempotence=true",
35+
"--set", "redpanda.auto_create_topics_enabled=true",
36+
"--set", "redpanda.default_topic_partitions=1"
37+
]
38+
ports:
39+
- containerPort: 9092
40+
- containerPort: 8081
41+
- containerPort: 8082
42+
livenessProbe:
43+
httpGet:
44+
path: /v1/status/ready
45+
port: 9644
46+
initialDelaySeconds: 30
47+
periodSeconds: 10
48+
49+
---
50+
apiVersion: v1
51+
kind: Service
52+
metadata:
53+
name: redpanda
54+
namespace: materialize
55+
spec:
56+
selector:
57+
app: redpanda
58+
ports:
59+
- name: kafka
60+
protocol: TCP
61+
port: 9092
62+
targetPort: 9092
63+
- name: pandaproxy
64+
protocol: TCP
65+
port: 8082
66+
targetPort: 8082

0 commit comments

Comments
 (0)