🚀 Beginner-Level Monitoring & Logging Questions (1-20)

(Prometheus, Grafana, ELK Stack)

Prometheus Questions

1. What is Prometheus, and why is it used?

Answer:
Prometheus is an open-source monitoring and alerting system used to collect metrics from applications and infrastructure. It is widely used because of its pull-based model, powerful query language (PromQL), and time-series database capabilities.

Example Use Case:

Monitoring CPU, memory, and network usage
Collecting application performance metrics
Alerting on high error rates or latency

2. How does Prometheus collect data?

Answer:
Prometheus pulls metrics from target endpoints exposed via HTTP at /metrics. The targets can be defined in a static configuration or discovered dynamically (e.g., Kubernetes service discovery).

Example scrape configuration (prometheus.yml):

scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100']

3. What is PromQL?

Answer:
PromQL (Prometheus Query Language) is used to query and analyze metrics stored in Prometheus. It enables users to create alerts, dashboards, and graphs.

Example Queries:

CPU usage:

node_cpu_seconds_total{mode="user"} / sum(node_cpu_seconds_total) * 100

Request rate:
```
rate(http_requests_total[5m])
```

4. What are Prometheus exporters?

Answer:
Exporters are agents that collect and expose metrics from various applications and systems.

Common Exporters:

Node Exporter (system metrics)
Blackbox Exporter (network probes)
MySQL Exporter (database metrics)

5. How do you set up an alert in Prometheus?

Answer:
Alerts are configured in alerting_rules.yml and evaluated by the Alertmanager.

Example Rule:

groups:
  - name: instance_down
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          description: "Instance {{ $labels.instance }} is down."

Grafana Questions

6. What is Grafana?

Answer:
Grafana is an open-source analytics and visualization tool used to create interactive dashboards for monitoring data from Prometheus, ELK, and other sources.

7. How do you connect Grafana to Prometheus?

Answer:

Login to Grafana (http://localhost:3000).
Navigate to "Configuration" → "Data Sources".
Select Prometheus as the data source.
Enter Prometheus URL (http://localhost:9090).
Click Save & Test.

8. What are Grafana Panels?

Answer:
Panels are visual components in Grafana used to display data in various formats:

Graph Panel: Time-series data visualization
Single Stat Panel: Displays a single numeric value
Table Panel: Tabular data display

9. How do you create alerts in Grafana?

Answer:

Select a panel.
Click "Edit" → "Alert".
Define a condition using PromQL queries.
Set the evaluation interval (e.g., every 1m).
Configure the alert notification (Slack, Email, etc.).

10. How do you configure a Grafana dashboard using JSON?

Answer:
Export and import dashboards using JSON files.

Example JSON snippet:

{
  "panels": [
    {
      "type": "graph",
      "title": "CPU Usage",
      "targets": [
        { "expr": "node_cpu_seconds_total", "format": "time_series" }
      ]
    }
  ]
}

ELK Stack Questions (Elasticsearch, Logstash, Kibana)

11. What is the ELK Stack?

Answer:
The ELK Stack consists of:

Elasticsearch (search and analytics engine)
Logstash (log processing pipeline)
Kibana (visualization tool)

12. What is the role of Elasticsearch in ELK?

Answer:
Elasticsearch is a NoSQL, distributed search engine used to store, search, and analyze log data.

13. How does Logstash work?

Answer:
Logstash processes logs using a pipeline:

Input: Reads logs (from files, databases, Kafka, etc.)
Filter: Transforms logs (parse JSON, remove sensitive data)
Output: Sends logs to Elasticsearch or other storage

Example Logstash Configuration:

input { file { path => "/var/log/syslog" } }
filter { grok { match => { "message" => "%{SYSLOGTIMESTAMP:timestamp}" } } }
output { elasticsearch { hosts => ["localhost:9200"] } }

14. What is Kibana used for?

Answer:
Kibana is used to visualize and explore log data stored in Elasticsearch. It provides features like:

Dashboards: Custom data visualizations
Discover: Search raw logs
Alerts: Set up log-based alerts

15. How do you install the ELK stack?

Answer:
Install Elasticsearch, Logstash, and Kibana:

# Install Elasticsearch
sudo apt install elasticsearch

# Install Logstash
sudo apt install logstash

# Install Kibana
sudo apt install kibana

Start services:

sudo systemctl start elasticsearch logstash kibana

16. What is an Index in Elasticsearch?

Answer:
An index in Elasticsearch is like a database table that stores documents.

Example:

curl -X PUT "localhost:9200/logs"

17. How do you send logs from Logstash to Elasticsearch?

Answer:
Define an output plugin in Logstash configuration:

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "logs-%{+YYYY.MM.dd}"
  }
}

18. What is a Kibana Visualization?

Answer:
A Kibana Visualization is a graph, chart, or table displaying log data.

Example Visualizations:

Bar Chart (Logs per hour)
Pie Chart (Error types distribution)
Line Chart (CPU usage over time)

19. What is Filebeat?

Answer:
Filebeat is a lightweight log shipper that forwards logs to Logstash or Elasticsearch.

Example Filebeat Configuration:

filebeat.inputs:
  - type: log
    paths:
      - "/var/log/syslog"
output.elasticsearch:
  hosts: ["localhost:9200"]

20. What is the difference between Logstash and Filebeat?

Answer:

Logstash: Heavyweight, processes logs with complex transformations
Filebeat: Lightweight, only forwards logs with minimal processing

🚀 Intermediate-Level Monitoring & Logging Questions (21-40)

(Prometheus, Grafana, ELK Stack)

Prometheus Questions

21. What is the difference between Pull and Push monitoring models?

Answer:

Pull Model (Prometheus) → The monitoring system requests data from targets at regular intervals.
Push Model (StatsD, InfluxDB) → The target system sends data to a central monitoring system.

Prometheus uses a pull model because it provides better control over scraping intervals, avoids data duplication, and reduces unnecessary load on monitored systems. However, in some cases (e.g., short-lived jobs), Prometheus Pushgateway can be used to support push-based metrics.

22. How does Prometheus handle high-cardinality data?

Answer:
Prometheus stores time-series data efficiently, but high-cardinality metrics (many unique label combinations) can cause excessive memory and storage usage. Best practices include:

Avoid unnecessary labels (e.g., user_id or request_id).
Use histograms and summaries instead of tracking individual events.
Enable retention policies and downsampling for old data.

23. What are Recording Rules in Prometheus?

Answer:
Recording Rules allow precomputing and storing frequently used queries as new time-series metrics. This improves query performance.

Example:

groups:
  - name: response_time_rules
    rules:
      - record: instance:response_time:avg
        expr: avg(rate(http_request_duration_seconds[5m]))

This stores the average request duration as instance:response_time:avg, making future queries faster.

24. What is Thanos, and how does it complement Prometheus?

Answer:
Thanos extends Prometheus for scalability, long-term storage, and high availability. It:

Provides deduplication across multiple Prometheus instances.
Enables object storage support (e.g., S3, GCS).
Allows querying across multiple Prometheus servers via a single query layer.

Thanos is useful in multi-cluster environments where Prometheus instances are spread across multiple regions or clouds.

25. How do you handle Prometheus high availability (HA)?

Answer:
Prometheus is a single-node system by design, but HA can be achieved by:

Running multiple Prometheus replicas (scraping the same targets).
Using Thanos or Cortex for deduplication and query federation.
Storing time-series data externally (e.g., in S3, Bigtable).

Grafana Questions

26. How do you enable authentication in Grafana?

Answer:
Grafana supports multiple authentication methods:

Basic authentication (default).
OAuth providers (Google, GitHub, Azure AD, etc.).
LDAP authentication for enterprise use.

To enable OAuth authentication, modify grafana.ini:

[auth.github]
enabled = true
client_id = YOUR_CLIENT_ID
client_secret = YOUR_CLIENT_SECRET

27. What are Templating Variables in Grafana?

Answer:
Templating allows users to create dynamic dashboards by using variables. Instead of hardcoding values, users can select values from dropdown menus.

Example:

rate(http_requests_total{job="$service"}[5m])

Here, $service is a variable that can be selected from a dropdown list in Grafana.

28. How do you set up Grafana provisioning?

Answer:
Grafana supports automated provisioning of dashboards and data sources using YAML configuration files.

Example datasource.yaml:

apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    access: proxy

29. What are Grafana Loki and Promtail?

Answer:

Loki is Grafana's log aggregation system, similar to Elasticsearch but optimized for Kubernetes and microservices.
Promtail is the log collection agent for pushing logs to Loki.

Promtail collects logs from /var/log and forwards them to Loki.

30. How can you monitor Kubernetes with Grafana?

Answer:
Use kube-prometheus-stack, which includes:

Prometheus Operator (for Kubernetes metrics).
Grafana dashboards for cluster monitoring.
Node Exporter and Kube-State-Metrics for detailed node/pod-level metrics.

ELK Stack Questions (Elasticsearch, Logstash, Kibana)

31. What is an Elasticsearch Shard, and why is it important?

Answer:
An Elasticsearch shard is a subdivision of an index. Each index is split into shards to allow parallel processing and redundancy.

Primary Shards: Store original data.
Replica Shards: Duplicates of primary shards for fault tolerance.

Example:

curl -X PUT "localhost:9200/logs?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": { "number_of_shards": 3, "number_of_replicas": 2 }
}'

This creates an index with 3 primary and 2 replica shards.

32. What is Index Lifecycle Management (ILM) in Elasticsearch?

Answer:
ILM automates index retention policies, ensuring efficient storage use. Stages include:

Hot Phase: Frequent reads/writes.
Warm Phase: Less frequent queries.
Cold Phase: Rarely accessed data.
Delete Phase: Data deletion.

ILM is useful for managing log retention in ELK stacks.

33. How do you configure Logstash pipelines?

Answer:
Logstash uses a pipeline of input → filter → output.

Example logstash.conf:

input {
  beats {
    port => 5044
  }
}
filter {
  grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}" } }
}
output {
  elasticsearch { hosts => ["localhost:9200"] }
}

This pipeline processes logs from Filebeat → Logstash → Elasticsearch.

34. What are Kibana Canvas and Lens?

Answer:

Canvas → Used for creating custom, highly stylized reports and presentations.
Lens → Drag-and-drop interface for creating advanced visualizations easily.

35. How do you configure Kibana security?

Answer:
Enable authentication in kibana.yml:

xpack.security.enabled: true
elasticsearch.username: "kibana"
elasticsearch.password: "changeme"

Use role-based access control (RBAC) to restrict access.

36. What is Beats in the ELK stack?

Answer:
Beats are lightweight data shippers for sending logs, metrics, and security data to ELK.

Filebeat: Log shipping.
Metricbeat: System metrics.
Packetbeat: Network monitoring.

37. What is Curator in Elasticsearch?

Answer:
Curator is a tool for managing Elasticsearch indices, used for deleting old indices, snapshot backups, and optimizing performance.

38. How do you integrate Prometheus and ELK Stack?

Answer:
Use Metricbeat to collect system metrics and send them to Elasticsearch, while Prometheus Node Exporter collects Prometheus-compatible metrics.

39. What is a Slow Query in Elasticsearch?

Answer:
A slow query is a query that takes too long to execute, often due to large data scans or missing indexes. Enable slow query logs to debug:

PUT _settings
{
  "index.search.slowlog.threshold.query.warn": "2s"
}

40. What is the ELK alternative to Prometheus and Grafana?

Answer:

Prometheus + Grafana → Metrics-based monitoring.
ELK Stack (Elasticsearch, Logstash, Kibana) → Log-based monitoring.
Alternative: OpenTelemetry, Loki, and InfluxDB.

🚀 Advanced-Level Monitoring & Logging Questions (41-60)

(Prometheus, Grafana, ELK Stack)

Prometheus Questions

41. How do you scale Prometheus for a large environment?

Answer:
Prometheus is a single-node system, so for large environments:

Use multiple Prometheus instances scraping different targets.
Federation: Create a parent Prometheus that scrapes aggregated metrics from child Prometheus instances.
Remote storage: Use Thanos, Cortex, or Mimir to store metrics in scalable object storage (S3, GCS).
Sharding: Distribute scraping targets across Prometheus instances using load balancing tools like Kube StatefulSets.

42. How does Prometheus handle stale or missing metrics?

Answer:

Stale markers: Prometheus marks time-series data as stale if a target stops reporting metrics.
Absent function (absent()): Used in PromQL to detect missing metrics.
Dead Man’s Switch: A constant alert (e.g., ALWAYS_ON) ensures the alerting system is functional.

Example:

absent(up{job="my_service"})

Triggers an alert if up{job="my_service"} is missing.

43. What is Prometheus WAL (Write-Ahead Log) and its purpose?

Answer:
The Write-Ahead Log (WAL) in Prometheus:

Stores data on disk before committing it to TSDB (Time-Series Database).
Reduces data loss during crashes.
WAL files are stored in /data/wal/ and help recover metrics quickly after a restart.

44. What are Histogram and Summary metrics in Prometheus?

Answer:
Both are used for measuring latency and response time:

Histogram: Buckets data into predefined ranges, allowing percentiles to be calculated later.
Summary: Precomputes percentiles but cannot be aggregated across instances.

Example (Histogram metric):

histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

This calculates the 95th percentile response time.

45. How do you secure Prometheus endpoints?

Answer:

Enable authentication & TLS via a reverse proxy (Nginx, Traefik).
Use RBAC (Role-Based Access Control) in Kubernetes for limiting access.
Set up network policies to restrict Prometheus access.

Example: Using basic auth with Nginx:

server {
  listen 9090;
  location / {
    auth_basic "Restricted";
    auth_basic_user_file /etc/nginx/.htpasswd;
  }
}

Grafana Questions

46. How do you monitor Prometheus itself using Grafana?

Answer:

Enable the built-in Prometheus self-metrics endpoint (/metrics).
Use dashboards to monitor scrape latency, TSDB memory usage, query duration.
Use the Prometheus Federation API to get meta-metrics.

47. What are Grafana Annotations and how are they useful?

Answer:
Annotations mark events (deployments, incidents, downtimes) on Grafana graphs for better visualization.
Example: Mark a Kubernetes deployment event in Grafana.

48. How do you configure Grafana for multi-tenancy?

Answer:

Organizations: Create multiple teams with separate dashboards.
Data source permissions: Restrict access at the data-source level.
Multi-instance deployment: Run separate Grafana instances for different teams.

49. What is Alerting in Grafana and how does it work?

Answer:

Grafana alerts monitor query conditions.
Alert states: OK, Pending, Alerting, No Data.
Notification channels: Slack, PagerDuty, Email, Webhooks.

Example Grafana alert condition:

avg(http_requests_total) > 1000 → Sends an alert if requests exceed 1000.

50. How does Loki compare with Elasticsearch for logging?

Answer:

Feature	Loki	Elasticsearch
Storage	Compressed logs	Full-text index
Querying	Label-based	Query DSL
Performance	Faster (optimized for Kubernetes)	Heavy resource usage

Loki is recommended for lightweight, Kubernetes-native logging, while Elasticsearch is better for complex log analysis.

ELK Stack Questions

51. What is the Hot-Warm-Cold architecture in Elasticsearch?

Answer:
This strategy optimizes storage cost:

Hot Nodes → Store recent, frequently queried data.
Warm Nodes → Store older logs with infrequent access.
Cold Nodes → Store archived logs for long-term retention.

52. How do you reduce indexing pressure in Elasticsearch?

Answer:

Use ILM (Index Lifecycle Management).
Optimize shard count (Avoid too many small shards).
Increase refresh intervals (index.refresh_interval: 30s).

53. How does Logstash manage backpressure?

Answer:

Persistent Queues → Buffer data before sending to Elasticsearch.
Dead Letter Queue (DLQ) → Stores failed events for reprocessing.

Example:

queue.type: persisted
queue.max_bytes: 1gb

54. What are Query Caching strategies in Elasticsearch?

Answer:

Request cache: Stores query results.
Shard request cache: Caches aggregations and filters.
Doc value cache: Optimizes sorting and aggregations.

55. How do you use Kibana for anomaly detection?

Answer:

Machine Learning Jobs → Identify unusual trends in logs.
SIEM (Security Information and Event Management) → Detect security threats.

Example anomaly detection job:

{
  "analysis_config": {
    "bucket_span": "15m",
    "detectors": [{ "function": "mean", "field_name": "cpu_usage" }]
  }
}

56. How do you secure Elasticsearch clusters?

Answer:

Enable TLS (xpack.security.enabled: true).
Use API Key authentication.
Implement firewall rules to restrict access.

57. How do you integrate Prometheus with Elasticsearch?

Answer:

Use Metricbeat to push Prometheus data into Elasticsearch.
Use Grafana to visualize both Prometheus & ELK logs.

Example Metricbeat configuration:

metricbeat.modules:
  - module: prometheus
    metricsets: ["collector"]
    host: "localhost:9090"

58. How do you optimize Elasticsearch queries for performance?

Answer:

Use filters (term, match_phrase) instead of full-text search.
Avoid wildcard (*) searches.
Use doc_values for sorting and aggregations.

59. How do you implement centralized logging in Kubernetes?

Answer:

Use Fluentd/Filebeat to collect logs.
Send logs to Elasticsearch or Loki.
Monitor logs via Kibana or Grafana dashboards.

Example Fluentd configuration:

<match kubernetes.**>
  @type elasticsearch
  host elasticsearch
  logstash_format true
</match>

60. What are the best practices for log retention and compliance?

Answer:

Use ILM to delete old logs automatically.
Encrypt sensitive logs (xpack.security).
Mask PII data before indexing logs.
Set audit logs for security compliance.

📢 Contribute & Stay Updated

💡 Want to contribute?
We welcome contributions! If you have insights, new tools, or improvements, feel free to submit a pull request.

📌 How to Contribute?

Read the CONTRIBUTING.md guide.
Fix errors, add missing topics, or suggest improvements.
Submit a pull request with your updates.

📢 Stay Updated:
⭐ Star the repository to get notified about new updates and additions.
💬 Join discussions in GitHub Issues to suggest improvements.

🌍 Community & Support

🔗 GitHub: @NotHarshhaa
📝 Blog: ProDevOpsGuy
💬 Telegram Community: Join Here

Files

README.md

Latest commit

History

README.md

File metadata and controls

🚀 Beginner-Level Monitoring & Logging Questions (1-20)

(Prometheus, Grafana, ELK Stack)

Prometheus Questions

1. What is Prometheus, and why is it used?

2. How does Prometheus collect data?

3. What is PromQL?

4. What are Prometheus exporters?

5. How do you set up an alert in Prometheus?

Grafana Questions

6. What is Grafana?

7. How do you connect Grafana to Prometheus?

8. What are Grafana Panels?

9. How do you create alerts in Grafana?

10. How do you configure a Grafana dashboard using JSON?

ELK Stack Questions (Elasticsearch, Logstash, Kibana)

11. What is the ELK Stack?

12. What is the role of Elasticsearch in ELK?

13. How does Logstash work?

14. What is Kibana used for?

15. How do you install the ELK stack?

16. What is an Index in Elasticsearch?

17. How do you send logs from Logstash to Elasticsearch?

18. What is a Kibana Visualization?

19. What is Filebeat?

20. What is the difference between Logstash and Filebeat?

🚀 Intermediate-Level Monitoring & Logging Questions (21-40)

(Prometheus, Grafana, ELK Stack)

Prometheus Questions

21. What is the difference between Pull and Push monitoring models?

22. How does Prometheus handle high-cardinality data?

23. What are Recording Rules in Prometheus?

24. What is Thanos, and how does it complement Prometheus?

25. How do you handle Prometheus high availability (HA)?

Grafana Questions

26. How do you enable authentication in Grafana?

27. What are Templating Variables in Grafana?

28. How do you set up Grafana provisioning?

29. What are Grafana Loki and Promtail?

30. How can you monitor Kubernetes with Grafana?

ELK Stack Questions (Elasticsearch, Logstash, Kibana)

31. What is an Elasticsearch Shard, and why is it important?

32. What is Index Lifecycle Management (ILM) in Elasticsearch?

33. How do you configure Logstash pipelines?

34. What are Kibana Canvas and Lens?

35. How do you configure Kibana security?

36. What is Beats in the ELK stack?

37. What is Curator in Elasticsearch?

38. How do you integrate Prometheus and ELK Stack?

39. What is a Slow Query in Elasticsearch?

40. What is the ELK alternative to Prometheus and Grafana?

🚀 Advanced-Level Monitoring & Logging Questions (41-60)

(Prometheus, Grafana, ELK Stack)

Prometheus Questions

41. How do you scale Prometheus for a large environment?

42. How does Prometheus handle stale or missing metrics?

43. What is Prometheus WAL (Write-Ahead Log) and its purpose?

44. What are Histogram and Summary metrics in Prometheus?

45. How do you secure Prometheus endpoints?

Grafana Questions

46. How do you monitor Prometheus itself using Grafana?

47. What are Grafana Annotations and how are they useful?

48. How do you configure Grafana for multi-tenancy?

49. What is Alerting in Grafana and how does it work?

50. How does Loki compare with Elasticsearch for logging?

ELK Stack Questions

51. What is the Hot-Warm-Cold architecture in Elasticsearch?

52. How do you reduce indexing pressure in Elasticsearch?

53. How does Logstash manage backpressure?

54. What are Query Caching strategies in Elasticsearch?

55. How do you use Kibana for anomaly detection?

56. How do you secure Elasticsearch clusters?

57. How do you integrate Prometheus with Elasticsearch?

58. How do you optimize Elasticsearch queries for performance?

59. How do you implement centralized logging in Kubernetes?