feat(docs): update docs (#567)

* update usage/ * accounts-db fixes * finish update db docs * fix geyser docs * fix other docs * fix geyser docs * fix geyser docs * more fixes * fix main readme * fix * fix docs
Syndica · Feb 14, 2025 · 50a4f20 · 50a4f20
1 parent 3f6ab00
commit 50a4f20
Show file tree

Hide file tree

Showing 40 changed files with 1,102 additions and 3,882 deletions.
diff --git a/docs/check.py b/docs/check.py
@@ -4,20 +4,19 @@
 import generate as g
 
 # checks if the docs folder is up to date with the source readme.md files
+# NOTE: only supports either `python docs/check.py .` OR `python check.py ../`
 if __name__ == "__main__":
     arg_parser = argparse.ArgumentParser()
     arg_parser.add_argument("src_dir")
     args = arg_parser.parse_args()
 
     exclude_dirs = [
-        args.src_dir + "docs", # dont search yourself
-        args.src_dir + "data", # this should only include data
+        os.path.join(args.src_dir, "docs"), # dont search yourself
+        os.path.join(args.src_dir, "data"), # this should only include data
     ]
 
-    code_path = os.path.join(args.src_dir, "docs/docusaurus/docs/code")
-    for name, src_path in g.get_markdown_files(args.src_dir, exclude_dirs):
-        docs_path = os.path.join(code_path, name + ".md")
-
+    doc_dir_path = os.path.join(args.src_dir, "docs/docusaurus/docs")
+    for src_path, docs_path in g.get_markdown_files(args.src_dir, exclude_dirs, doc_dir_path):
         # check to see if the files are the same !
         with open(src_path, "r") as src_f:
             with open(docs_path, "r") as docs_f:
@@ -38,3 +37,5 @@
                             print("Docs:", docs_lines[i])
                             break
                     exit(1)
+
+    print("Docs folder is up to date!")
diff --git a/.../docusaurus/docs/contributing/join-us.mdx → docs/docusaurus/docs/Join The Team.mdx b/.../docusaurus/docs/contributing/join-us.mdx → docs/docusaurus/docs/Join The Team.mdx
@@ -1,9 +1,9 @@
 ---
-sidebar_position: 4
-title: Join Us
+sidebar_position: 10
+title: Join The Team
 ---
 
-If you are a talented engineer who thrives in a collaborative and fast-paced environment, 
+If you are a talented engineer who thrives in a collaborative and fast-paced environment,
 and you're excited about contributing to the advancement of Solana's ecosystem, we would love to hear from you.
 
 See our current openings [here](https://jobs.ashbyhq.com/syndica).
diff --git a/docs/docusaurus/docs/code/accountsdb.md b/docs/docusaurus/docs/code/accountsdb.md
diff --git a/docs/docusaurus/docs/code/geyser.md b/docs/docusaurus/docs/code/geyser.md
@@ -5,80 +5,114 @@ its only used to stream accounts from a snapshot while loading/valdiating a snap
 
 The main code is located in `/src/geyser/`.
 
-`lib.zig` and contains a few key structs:
+`lib.zig` contains a few key structs:
 - `GeyserWriter`: used to write new accounts
 - `GeyserReader`: used to read new accounts
 
-both use linux pipes to stream data. this involves
+Linux pipes are used to stream data. this involves
 opening a file-based pipe using the `mkfifo` syscall which is then
-written to like any other file. the key method used to setup
+written to like any other file. The key method used to setup
 the pipes is `openPipe` in `src/geyser/core.zig`.
 
-## cli commands
+## Usage
 
-while running, grafana stats will be available. the main binary code is in
-`src/geyser/main.zig`
-
-### benchmarking
-
-we also have benchmarking to measure the throughput of geyser. you can run it using
+The main binary code is in `src/geyser/main.zig` and can be used to
+read accounts during account validation using:
 
 ```bash
 zig build -Doptimize=ReleaseSafe
-./zig-out/bin/benchmark geyser
-```
 
-you can also benchmark an dummy reader
+# run the snapshot validator with geyser enabled
+./zig-out/bin/sig snapshot-validate --enable-geyser &
 
-```bash
-# in terminal 1 -- read the snapshot accounts to geyser
-./zig-out/bin/sig snapshot-validate -g data/genesis-files/testnet_genesis.bin --enable-geyser -a 250 -t 2
+# run the geyser reader which dumps the accounts to a csv file
+./zig-out/bin/geyser csv
 
-# in terminal 2 -- benchmark how fast you can read
-./zig-out/bin/geyser benchmark
 ```
 
-### dump a snapshot to csv
+Metrics are also available to understand how fast data is being written/read which can
+be viewed from the grafana dashboard (see `metrics` information for more details).
 
-after downloading a snapshots, you can dump the accounts to a csv using the
-csv geyser command, for example:
+## Csv File Dumping
+
+After downloading a snapshots, you can dump the accounts to a csv using the
+`csv` geyser command:
 
 ```bash
 # in terminal 1 -- read the snapshot accounts to geyser
-./zig-out/bin/sig snapshot-validate -g data/genesis-files/testnet_genesis.bin --enable-geyser -a 250 -t 2
+./zig-out/bin/sig snapshot-validate -n testnet --enable-geyser
 
 # in terminal 2 -- dump accounts to a csv (validator/accounts.csv)
 ./zig-out/bin/geyser csv
 # OR dump only specific account owners (ie, the drift program)
 ./zig-out/bin/geyser csv -o dRiftyHA39MWEi3m9aunc5MzRF1JYuBsbn6VPcn33UH
 ```
 
-## Architecture
+## Benchmarks
 
-### how data is written/read
+We also have benchmarking to measure the throughput of geyser. you can run it using
 
-currently, data is serialized and written through the pipe using `bincode`
+```bash
+zig build -Doptimize=ReleaseSafe
+./zig-out/bin/benchmark geyser
+```
 
-data is organized to be written as `[size, serialized_data]`
+*Note*: due to output formatting, geyser results are off by default to turn them on
+you will need to change the logger to use `debug` level.
 
-where `size` is the full length of the `serialized_data`
+```zig
+var std_logger = sig.trace.DirectPrintLogger.init(
+    allocator,
+  -  .info,
+  +  .debug,
+);
+```
+
+You can also benchmark an dummy reader with production data:
+
+```bash
+# run the snapshot validator with geyser enabled
+./zig-out/bin/sig snapshot-validate --enable-geyser &
+
+# benchmark how fast you can read (data is read and discarded)
+./zig-out/bin/geyser benchmark
+```
 
-this allows for more efficient buffered reads where you can read the first 8 bytes in
+## Architecture
+
+### How Data is Read/Written
+
+Currently, data is serialized and written through the pipe using `bincode` as it is simple and efficient
+encoding format in the repo (future work can use faster encoding schemes if its required).
+
+Data is organized to be written as `[size, serialized_data]`
+where `size` is the full length of the `serialized_data`.
+
+This allows for more efficient buffered reads where you can read the first 8 bytes in
 the pipe, cast to a u64, allocate a buffer of that size and then read the rest of
-the data associated with that payload.
+the data associated with that payload:
 
-the key struct used is `AccountPayload` which uses a versioned system to support different payload types (`VersionedAccountPayload`) while also being backwards compatibility.
+```zig
+/// reads a payload from the pipe and returns the total bytes read with the data
+pub fn readPayload(self: *GeyserReader) !struct { u64, VersionedAccountPayload } {
+    const len = try self.readType(u64, 8);
+    const versioned_payload = try self.readType(VersionedAccountPayload, len);
+    return .{ 8 + len, versioned_payload };
+}
+```
 
-### GeyserWriter
+The key struct used is `AccountPayload` which uses a versioned system to support
+different payload types (`VersionedAccountPayload`) while also being backwards compatibility.
 
-![](/img/2024-08-07-17-27-36.png)
+### Geyser Writer
 
 #### IO Thread
 
-the writer uses a separate thread to write to the pipe due to expensive i/o operations.
+The GeyserWriter uses a separate thread to write to the pipe due to expensive i/o operations.
 to spawn this thread, use the `spawnIOLoop` method.
 
-it loops, draining the channel for payloads with type (`[]u8`) and then writes the bufs to the pipe and then frees the payload using the `RecycleFBA`
+it loops, draining the channel for payloads with type (`[]u8`) and then writes the bufs to the
+pipe and then frees the payload using the `RecycleFBA`.
 
 #### RecycleFBA
 
@@ -102,7 +136,7 @@ records field.
 
 when free is called, we find the buffer in the records and set the record's `is_free = true`.
 
-#### usage
+#### Usage
 
 ```zig
 // setup writer

diff --git a/docs/docusaurus/docs/code/gossip.md b/docs/docusaurus/docs/code/gossip.md
@@ -8,39 +8,87 @@ For an introduction to Solana's gossip protocol, check out the technical section
 
 Checkout the full engineering blog post here: [https://blog.syndica.io/sig-engineering-1-gossip-protocol/](https://blog.syndica.io/sig-engineering-1-gossip-protocol/).
 
-## Repository File Outline
-
-- `service.zig`: main logic for reading, processing, and sending gossip messages
+The main struct files include:
+- `service.zig`: reading, processing, and sending gossip messages
 - `table.zig`: where gossip data is stored
-- `data.zig`: various gossip data structure definitions
+- `data.zig`: various gossip data definitions
 - `pull_request.zig`: logic for sending pull *requests*
 - `pull_response.zig`: logic for sending pull *responses* (/handling incoming pull requests)
-- `gossip_shards.zig`: datastructure which stores gossip data hashes for quick lookup - used in `gossip_table` and constructing pull responses
+- `gossip_shards.zig`: datastructure which stores gossip data hashes for quick lookup (used in `gossip_table` and constructing pull responses)
 - `active_set.zig`: logic for deriving a list of peers to send push messages to
 - `ping_pong.zig`: logic for sending ping/pong messages as a heartbeat check
 
-A gossip spy is, in essence, software written to do two things: store data and send/receive requests.
+Other files include:
+- `fuzz_service.zig`: a fuzzing client for testing the gossip service
+- `fuzz_table.zig`: a fuzzing client for testing the gossip table
+
+## Usage
+
+Simple usage of the gossip service is as follows:
+
+```zig
+const service = try GossipService.create(
+    // general allocator
+    std.heap.page_allocator,
+    // allocator specifically for gossip values
+    std.heap.page_allocator,
+    // information about the current node to share with the network (via gossip)
+    contact_info,
+    // keypair for signing messages
+    my_keypair,
+    // entrypoints to discover peers
+    entrypoints,
+    // logger
+    logger,
+);
+
+// start the gossip service (ie, spin up the threads
+// to process and generate messages)
+try service.start(.{
+    .spy_node = false,
+    .dump = false,
+});
+```
+
+*Note:* a `spy_node` is a node that listens to gossip messages but does not send any.
+This is useful for debugging and monitoring the network.
+
+*Note:* `dump` is a flag to print out the gossip table to a file every 10 seconds
+(see `dump_service.zig` for more).
+
+*Note:* for an easy to use example, see `initGossipFromCluster` in `helpers.zig`.
 
 ## Benchmarks
 
-benchmarks are located at the bottom of `service.zig`.
-
-to run the benchmarks:
-- build sig in `ReleaseSafe` (ie, `zig build -Doptimize=ReleaseSafe`)
-- run `./zig-out/bin/benchmark gossip`
+Benchmarks are located at the bottom of `service.zig`:
+- `BenchmarkGossipServiceGeneral`: benchmarks ping, push, and pull response
+messages
+- `BenchmarkGossipServicePullRequest`: benchmarks pull request messages (which require
+a bit more work to construct)
 
-this includes processing times for pings, push messages, pull responses, and
-pull requests.
+You can run both benchmarks using: `./zig-out/bin/benchmark gossip`.
 
 ## Fuzzing
 
-the fuzzing client is located in `fuzz.zig`.
+We support two fuzzing options:
+- `fuzz_service.zig`: fuzzing the gossip service
+- `fuzz_table.zig`: afuzzing the gossip table
+
+### Fuzzing the Service
+
+```bash
+zig build -Dno-run fuzz
+
+fuzz gossip_service <seed> <number_of_actions>
+```
+
+### Fuzzing the Table
+
+```bash
+zig build -Dno-run fuzz
 
-to run the client
-- start a sig gossip in a terminal (ie, listening on `8001`)
-- build the fuzz client in `ReleaseSafe` (ie, `zig build -Doptimize=ReleaseSafe`)
-- run the fuzz client pointing to sig with some seed and some number of random messages
-to send: `./zig-out/bin/fuzz <entrypoint> <seed> <num_messages>` (eg, `./zig-out/bin/fuzz 127.0.0.1:8001 19 100000`)
+fuzz gossip_table <seed> <number_of_actions>
+```
 
 ## Architecture
 

diff --git a/docs/docusaurus/docs/contributing/_category_.json b/docs/docusaurus/docs/contributing/_category_.json
@@ -1,10 +1,10 @@
 {
-	"position": 4,
-	"label": "Contributing",
-	"collapsible": true,
-	"collapsed": true,
-	"className": "red",
-	"customProps": {
-		"description": "How to contribute to this project"
-	}
+  "position": 5,
+  "label": "Contributing",
+  "collapsible": true,
+  "collapsed": true,
+  "className": "red",
+  "customProps": {
+    "description": "Tools to use throughout the repository"
+  }
 }
diff --git a/docs/docusaurus/docs/contributing/dev-tools.mdx b/docs/docusaurus/docs/contributing/dev-tools.mdx