Skip to content

Commit 660e96b

Browse files
authored
Update development environment docs (HemeraProtocol#185)
1 parent d1f6299 commit 660e96b

File tree

3 files changed

+111
-55
lines changed

3 files changed

+111
-55
lines changed

Diff for: README.md

+86-19
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,9 @@ As of July 5, 2024, the initial open-source version of the Hemera Indexer offers
1717

1818
## Features Offered
1919

20-
##### Export the following entities
20+
#### Exportable Entities
21+
22+
The system can export the following entities:
2123

2224
- Blocks
2325
- Transactions
@@ -33,13 +35,80 @@ As of July 5, 2024, the initial open-source version of the Hemera Indexer offers
3335
- DA Transactions
3436
- User Operations
3537

36-
##### Into the following formats
38+
#### Supported Export Formats
39+
40+
The data can be exported into the following formats:
3741

3842
- Postgresql SQL
3943
- JSONL
4044
- CSV
4145

42-
##### Additional features
46+
#### Output Types and Entity Types Explanation
47+
48+
##### Entity Types
49+
50+
Entity Types are high-level categories that group related data models. They are defined in the `EntityType` enum and can be combined using bitwise operations.
51+
52+
##### Key Points:
53+
- Specified using the `-E` or `--entity-types` option
54+
- Examples: EXPLORER_BASE, EXPLORER_TOKEN, EXPLORER_TRACE, etc.
55+
- Multiple types can be combined using commas
56+
57+
##### Output Types
58+
59+
Output Types correspond to more detailed data models and are typically associated with specific Entity Types.
60+
61+
##### Key Points:
62+
- Specified using the `-O` or `--output-types` option
63+
- Examples: Block, Transaction, Log, Token, AddressTokenBalance, etc.
64+
- Takes precedence over Entity Types if specified
65+
- Directly corresponds to data class names in the code (Domain)
66+
67+
##### Relationship between Entity Types and Output Types
68+
69+
1. Entity Types are used to generate a set of Output Types:
70+
- The `generate_output_types` function maps Entity Types to their corresponding Output Types.
71+
- Each Entity Type yields a set of related data classes (Output Types).
72+
73+
2. When specifying Output Types directly:
74+
- It overrides the Entity Type selection.
75+
- Allows for more granular control over the exported data.
76+
77+
#### Output Types and Data Classes
78+
79+
It's important to note that when using the `--output-types` option, you should specify the names that directly correspond to the data class names in the code. For example:
80+
81+
```
82+
--output-types Block,Transaction,Log,Token,ERC20TokenTransfer
83+
```
84+
85+
These names should match exactly with the data class definitions in your codebase. The Output Types are essentially the same as the data class names, allowing for precise selection of the data models you wish to export.
86+
87+
#### Usage Examples
88+
89+
1. Using Entity Types:
90+
```
91+
--entity-types EXPLORER_BASE,EXPLORER_TOKEN
92+
```
93+
This will generate Output Types including Block, Transaction, Log, Token, ERC20TokenTransfer, etc.
94+
95+
2. Using Output Types:
96+
```
97+
--output-types Block,Transaction,Token
98+
```
99+
This will only generate the specified Output Types, regardless of Entity Types.
100+
101+
#### Note
102+
103+
When developing or using this system, consider the following:
104+
- Entity Types provide a broader, category-based selection of data.
105+
- Output Types offer more precise control over the exact data models to be exported.
106+
- The choice between using Entity Types or Output Types depends on the specific requirements of the data export task.
107+
108+
109+
These names should match exactly with the data class definitions in your codebase. The Output Types are essentially the same as the data class names, allowing for precise selection of the data models you wish to export.
110+
111+
#### Additional features
43112

44113
- Ability to select arbitrary block ranges for more flexible data indexing
45114
- Option to choose any entities for targeted data extraction
@@ -232,23 +301,22 @@ Follow the instructions about how to set up a PostgreSQL database here: [Setup P
232301

233302
Configure the `OUTPUT` or `--output` parameter according to your PostgreSQL role information. Check out [Configure Hemera Indexer](#output-or---output) for details.
234303

235-
E.g. `postgresql+psycopg2://${YOUR_USER}:${YOUR_PASSWORD}@${YOUR_HOST}:5432/${YOUR_DATABASE}`.
304+
E.g. `postgresql://${YOUR_USER}:${YOUR_PASSWORD}@${YOUR_HOST}:5432/${YOUR_DATABASE}`.
236305

237306
#### Run
238307

239308
Please check out [Configure Hemera Indexer](#configure-hemera-indexer) on how to configure the indexer.
240309

241310
```bash
242311
python hemera.py stream \
243-
--provider-uri https://eth.llamarpc.com \
244-
--debug-provider-uri https://eth.llamarpc.com \
245-
--postgres-url postgresql+psycopg2://devuser:devpassword@localhost:5432/hemera_indexer \
246-
--output jsonfile://output/eth_blocks_20000001_20010000/json,csvfile://output/hemera_indexer/csv,postgresql+psycopg2://devuser:devpassword@localhost:5432/eth_blocks_20000001_20010000 \
312+
--provider-uri https://ethereum.publicnode.com \
313+
--postgres-url postgresql://devuser:devpassword@localhost:5432/hemera_indexer \
314+
--output jsonfile://output/eth_blocks_20000001_20010000/json,csvfile://output/hemera_indexer/csv,postgresql://devuser:devpassword@localhost:5432/eth_blocks_20000001_20010000 \
247315
--start-block 20000001 \
248316
--end-block 20010000 \
249317
# alternatively you can spin up a separate process for traces, as it takes more time
250318
# --entity-types trace,contract,coin_balance
251-
--entity-types block,transaction,log,token,token_transfer \
319+
--entity-types EXPLORER_BASE \
252320
--block-batch-size 200 \
253321
--batch-size 200 \
254322
--max-workers 32
@@ -308,19 +376,18 @@ E.g., If you specify the `OUTPUT` or `--output` parameter as below
308376
```bash
309377
# Command line parameter
310378
python hemera.py stream \
311-
--provider-uri https://eth.llamarpc.com \
312-
--debug-provider-uri https://eth.llamarpc.com \
313-
--postgres-url postgresql+psycopg2://devuser:devpassword@localhost:5432/hemera_indexer \
314-
--output jsonfile://output/eth_blocks_20000001_20010000/json,csvfile://output/hemera_indexer/csv,postgresql+psycopg2://devuser:devpassword@localhost:5432/eth_blocks_20000001_20010000 \
379+
--provider-uri https://ethereum.publicnode.com \
380+
--postgres-url postgresql://devuser:devpassword@localhost:5432/hemera_indexer \
381+
--output jsonfile://output/eth_blocks_20000001_20010000/json,csvfile://output/hemera_indexer/csv,postgresql://devuser:devpassword@localhost:5432/eth_blocks_20000001_20010000 \
315382
--start-block 20000001 \
316383
--end-block 20010000 \
317-
--entity-types block,transaction,log,token,token_transfer \
384+
--entity-types EXPLORER_BASE \
318385
--block-batch-size 200 \
319386
--batch-size 200 \
320387
--max-workers 32
321388

322389
# Or using environment variable
323-
export OUTPUT = postgresql+psycopg2://user:password@localhost:5432/hemera_indexer,jsonfile://output/json, csvfile://output/csv
390+
export OUTPUT = postgresql://user:password@localhost:5432/hemera_indexer,jsonfile://output/json, csvfile://output/csv
324391
```
325392

326393
You will be able to find those results in the `output` folder of your current location.
@@ -367,7 +434,7 @@ The URI of the web3 debug rpc provider, e.g. `file://$HOME/Library/Ethereum/geth
367434
#### `POSTGRES_URL` or `--postgres-url` or `-pg`
368435

369436
[**Required**]
370-
The PostgreSQL connection URL that the Hemera Indexer used to maintain its state. e.g. `postgresql+psycopg2://user:[email protected]:5432/postgres`.
437+
The PostgreSQL connection URL that the Hemera Indexer used to maintain its state. e.g. `postgresql://user:[email protected]:5432/postgres`.
371438

372439
#### `OUTPUT` or `--output` or `-o`
373440

@@ -379,19 +446,19 @@ The file location will be relative to your current location if you run from sour
379446

380447
e.g.
381448

382-
- `postgresql+psycopg2://user:password@localhost:5432/hemera_indexer`: Output will be exported to your postgres.
449+
- `postgresql://user:password@localhost:5432/hemera_indexer`: Output will be exported to your postgres.
383450
- `jsonfile://output/json`: Json files will be exported to folder `output/json`
384451
- `csvfile://output/csv`: Csv files will be exported to folder `output/csv`
385452
- `console,jsonfile://output/json,csvfile://output/csv`: Multiple destinations are supported.
386453

387454
#### `ENTITY_TYPES` or `--entity-types` or `-E`
388455

389-
[**Default**: `EXPLORER_BASE,EXPLORER_TOKEN`]
456+
[**Default**: `EXPLORER_BASE`]
390457
The list of entity types to export. e.g. `EXPLORER_BASE`, `EXPLORER_TOKEN`, `EXPLORER_TRACE`.
391458

392459
#### `OUTPUT_TYPES` or `--output-types` or `-O`
393460

394-
The list of output types to export, corresponding to more detailed data models. Specifying this option will prioritize these settings over the entity types specified in -E. Available options include: block, transaction, log, token, address_token_balance, erc20_token_transfer, erc721_token_transfer, erc1155_token_transfer, trace, contract, coin_balance.
461+
The list of output types to export, corresponding to more detailed data models. Specifying this option will prioritize these settings over the entity types specified in -E. Available options include: Block, Transaction, Log, Token, AddressTokenBalance, etc.
395462

396463
You may spawn up multiple Hemera Indexer processes, each of them specifying different output types to accelerate the indexing process. For example, indexing `trace` data may take much longer than other entities, you may want to run a separate process to index `trace` data. Checkout `docker-compose/docker-compose.yaml` for examples.
397464

Diff for: docs/README.md

+24-35
Original file line numberDiff line numberDiff line change
@@ -227,23 +227,22 @@ Follow the instructions about how to set up a PostgreSQL database here: [Setup P
227227

228228
Configure the `OUTPUT` or `--output` parameter according to your PostgreSQL role information. Check out [Configure Hemera Indexer](#output-or---output) for details.
229229

230-
E.g. `postgresql+psycopg2://${YOUR_USER}:${YOUR_PASSWORD}@${YOUR_HOST}:5432/${YOUR_DATABASE}`.
230+
E.g. `postgresql://${YOUR_USER}:${YOUR_PASSWORD}@${YOUR_HOST}:5432/${YOUR_DATABASE}`.
231231

232232
#### Run
233233

234234
Please check out [Configure Hemera Indexer](#configure-hemera-indexer) on how to configure the indexer.
235235

236236
```bash
237237
python hemera.py stream \
238-
--provider-uri https://eth.llamarpc.com \
239-
--debug-provider-uri https://eth.llamarpc.com \
240-
--postgres-url postgresql+psycopg2://devuser:devpassword@localhost:5432/hemera_indexer \
241-
--output jsonfile://output/eth_blocks_20000001_20010000/json,csvfile://output/hemera_indexer/csv,postgresql+psycopg2://devuser:devpassword@localhost:5432/eth_blocks_20000001_20010000 \
238+
--provider-uri https://ethereum.publicnode.com \
239+
--postgres-url postgresql://devuser:devpassword@localhost:5432/hemera_indexer \
240+
--output jsonfile://output/eth_blocks_20000001_20010000/json,csvfile://output/hemera_indexer/csv,postgresql://devuser:devpassword@localhost:5432/eth_blocks_20000001_20010000 \
242241
--start-block 20000001 \
243242
--end-block 20010000 \
244243
# alternatively you can spin up a separate process for traces, as it takes more time
245244
# --entity-types trace,contract,coin_balance
246-
--entity-types block,transaction,log,token,token_transfer \
245+
--entity-types EXPLORER_BASE \
247246
--block-batch-size 200 \
248247
--batch-size 200 \
249248
--max-workers 32
@@ -303,19 +302,18 @@ E.g., If you specify the `OUTPUT` or `--output` parameter as below
303302
```bash
304303
# Command line parameter
305304
python hemera.py stream \
306-
--provider-uri https://eth.llamarpc.com \
307-
--debug-provider-uri https://eth.llamarpc.com \
308-
--postgres-url postgresql+psycopg2://devuser:devpassword@localhost:5432/hemera_indexer \
309-
--output jsonfile://output/eth_blocks_20000001_20010000/json,csvfile://output/hemera_indexer/csv,postgresql+psycopg2://devuser:devpassword@localhost:5432/eth_blocks_20000001_20010000 \
305+
--provider-uri https://ethereum.publicnode.com \
306+
--postgres-url postgresql://devuser:devpassword@localhost:5432/hemera_indexer \
307+
--output jsonfile://output/eth_blocks_20000001_20010000/json,csvfile://output/hemera_indexer/csv,postgresql://devuser:devpassword@localhost:5432/eth_blocks_20000001_20010000 \
310308
--start-block 20000001 \
311309
--end-block 20010000 \
312-
--entity-types block,transaction,log,token,token_transfer \
310+
--entity-types EXPLORER_BASE \
313311
--block-batch-size 200 \
314312
--batch-size 200 \
315313
--max-workers 32
316314

317315
# Or using environment variable
318-
export OUTPUT = postgresql+psycopg2://user:password@localhost:5432/hemera_indexer,jsonfile://output/json, csvfile://output/csv
316+
export OUTPUT = postgresql://user:password@localhost:5432/hemera_indexer,jsonfile://output/json, csvfile://output/csv
319317
```
320318

321319
You will be able to find those results in the `output` folder of your current location.
@@ -349,7 +347,19 @@ The URI of the web3 debug rpc provider, e.g. `file://$HOME/Library/Ethereum/geth
349347
#### `POSTGRES_URL` or `--postgres-url`
350348

351349
[**Required**]
352-
The PostgreSQL connection URL that the Hemera Indexer used to maintain its state. e.g. `postgresql+psycopg2://user:[email protected]:5432/postgres`.
350+
The PostgreSQL connection URL that the Hemera Indexer used to maintain its state. e.g. `postgresql://user:[email protected]:5432/postgres`.
351+
352+
353+
#### `ENTITY_TYPES` or `--entity-types` or `-E`
354+
355+
[**Default**: `EXPLORER_BASE`]
356+
The list of entity types to export. e.g. `EXPLORER_BASE`, `EXPLORER_TOKEN`, `EXPLORER_TRACE`.
357+
358+
#### `OUTPUT_TYPES` or `--output-types` or `-O`
359+
360+
The list of output types to export, corresponding to more detailed data models. Specifying this option will prioritize these settings over the entity types specified in -E. Available options include: Block, Transaction, Log, Token, AddressTokenBalance, etc.
361+
362+
You may spawn up multiple Hemera Indexer processes, each of them specifying different output types to accelerate the indexing process. For example, indexing `trace` data may take much longer than other entities, you may want to run a separate process to index `trace` data. Checkout `docker-compose/docker-compose.yaml` for examples.
353363

354364
#### `OUTPUT` or `--output`
355365

@@ -361,32 +371,11 @@ The file location will be relative to your current location if you run from sour
361371

362372
e.g.
363373

364-
- `postgresql+psycopg2://user:password@localhost:5432/hemera_indexer`: Output will be exported to your postgres.
374+
- `postgresql://user:password@localhost:5432/hemera_indexer`: Output will be exported to your postgres.
365375
- `jsonfile://output/json`: Json files will be exported to folder `output/json`
366376
- `csvfile://output/csv`: Csv files will be exported to folder `output/csv`
367377
- `console,jsonfile://output/json,csvfile://output/csv`: Multiple destinations are supported.
368378

369-
#### `ENTITY_TYPES` or `--entity-types`
370-
371-
[**Default**: `BLOCK,TRANSACTION,LOG,TOKEN,TOKEN_TRANSFER`]
372-
Hemera Indexer will export those entity types to your database and files(if `OUTPUT` is specified).
373-
Full list of available entity types:
374-
375-
- `block`
376-
- `transaction`
377-
- `log`
378-
- `token`
379-
- `token_transfer`
380-
- `trace`
381-
- `contract`
382-
- `coin_balance`
383-
- `token_balance`
384-
- `token_ids`
385-
386-
If you didn't specify this parameter, the default entity types will be BLOCK,TRANSACTION,LOG,TOKEN,TOKEN_TRANSFER.
387-
388-
You may spawn up multiple Hemera Indexer processes, each of them indexing different entity types to accelerate the indexing process. For example, indexing `trace` data may take much longer than other entities, you may want to run a separate process to index `trace` data. Checkout `docker-compose/docker-compose.yaml` for examples.
389-
390379
#### `DB_VERSION` or `--db-version`
391380

392381
[**Default**: `head`]

Diff for: indexer/utils/utils.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ def extract_eth_address(input_string):
174174

175175
hex_string = hex_string.zfill(40)
176176
return Web3.to_checksum_address(hex_string).lower()
177-
177+
178178

179179
def flatten(lst):
180180
result = []

0 commit comments

Comments
 (0)