Skip to content

Commit 1bf4a4b

Browse files
Expand on antlr parser documentation
1 parent c8982dd commit 1bf4a4b

File tree

1 file changed

+23
-7
lines changed

1 file changed

+23
-7
lines changed

docs/GrammarParser.md

+23-7
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,24 @@
33
This documentation details the design and implementation of the Harmony programming language parser.
44

55
- [Grammar Parser](#grammar-parser)
6+
- [Requirements](#requirements)
7+
- [Generating the Lexer and Parser](#generating-the-lexer-and-parser)
68
- [Modifying Harmony.g4](#modifying-harmonyg4)
7-
- [Expressions](#expressions)
89
- [Statements](#statements)
9-
- [Embedded Python Code in `Harmony.g4`](#embedded-python-code-in-harmonyg4)
1010
- [Modifying `custom_denter.py`](#modifying-custom_denterpy)
11+
- [Modifying `HarmonyErrorListener.py`](#modifying-harmonyerrorlistenerpy)
12+
- [Using the Lexer and Parser](#using-the-lexer-and-parser)
13+
14+
## Requirements
15+
16+
To use the `antlr` parser generator, the `Java` runtime must be installed. The committed `antlr-x.y.z-complete.jar` file can then be run to generate the lexer and parser files.
17+
18+
## Generating the Lexer and Parser
19+
The lexer and parser are generated in the `harmony_model_checker/parser` directory by running `make parser`. This creates `HarmonyLexer.py`, `HarmonyParser.py`, and `HarmonyVisitor.py`, which are to be committed for convenience.
1120

1221
## Modifying Harmony.g4
1322

14-
The `Harmony.g4` defines the grammar rules (statements and expressions) for the Harmony programming
15-
language written using `ANTLR 4.9.3`.
23+
The `Harmony.g4` defines the grammar rules (statements and expressions) for the Harmony programming language written using `ANTLR 4.9.3`.
1624

1725
> **NOTE**: The version of the Python3 antlr4 runtime must match
1826
the parser generator version exactly. That is, if the ANTLR parser generator version is change
@@ -22,15 +30,23 @@ compiler's dependency.
2230
If the `Harmony.g4` is modified, then run `make parser` to generate the updated `HarmonyLexer.py`,
2331
`HarmonyParser.py`, and `HarmonyVisitor.py` files. **Any changes to `HarmonyVisitor.py`, such as new/deleted/modified method headers must be reflected in the HarmonyVisitor implementation `antlr_rule_visitor.py`**.
2432

25-
### Expressions
26-
2733
### Statements
2834

29-
### Embedded Python Code in `Harmony.g4`
35+
The grammar file defines the `Harmony` language as a sequence of statements, where statements is one of an 1) import statment, 2) a compound statement, or 3) a simple statement. Each statement can optionally begin with a label name followed by a colon. Alternatively, a block of indented statements can follow a label/colon.
3036

3137
## Modifying `custom_denter.py`
38+
The grammar file contains embedded `Python` code that modifies the generated lexer's indentation behavior implemented by `cutom_denter.py`.
3239

3340
The `ModifiedDenterHelper` class implemented in this file extends the `DenterHelper` class of the
3441
`antlr-denter` dependency and overrides the internal behavior. The main behavior of interest is handling
3542
expressions that are broken into separate lines. Without modification, the default `DenterHelper` class would create `INDENT` and `DEDENT` tokens in between expression tokens, which would be difficult to parse using ANTLR 4. The extended implementation adds a check when the current token is a new indent while the previous observed token is an operator, such as `+` or `/`, which would be followed by another expression. If that's the case, then the observed `INDENT` (and the `DEDENT` token that would have also been output) is ignored.
3643

44+
## Modifying `HarmonyErrorListener.py`
45+
46+
This file implements an error handler for syntax errors encountered in the lexer or the parser. The classes override antlr's `ErrorListner` class and its `syntaxError` method. Some of the error messages that are created by `antlr` are replaced by custom error messages that are more suitable and convenient for us.
47+
48+
## Using the Lexer and Parser
49+
50+
In `harmony_model_checker/compile.py`, the lexer and parser are created for us in the `_build_parser` function.
51+
52+
Of note is the `_build_input_stream` function, which takes either a `filename` or `str_value` key argument. If `filename` is given, then it creates a `FileStream` to handle `utf-8`-encoded files (by default, the antlr lexer handles only files that contain ascii). Otherwise if `str_value` is given, then it creates an `InputStream` such that the string is treated as the content of a Harmony program. `FileStream` and `InputStream` are classes provided by the `antlr4` library.

0 commit comments

Comments
 (0)