Skip to content

Commit 1447518

Browse files
committedFeb 16, 2025·
Sigh. More edits on Symbology
1 parent 8729915 commit 1447518

File tree

1 file changed

+31
-25
lines changed

1 file changed

+31
-25
lines changed
 

‎_drafts/2025-02-14-symbolic-of-what.md

+31-25
Original file line numberDiff line numberDiff line change
@@ -9,18 +9,29 @@ An exercise in symbology.
99

1010

1111
## Introduction
12+
13+
Symbols in Clojure are a simple construct, viewed purely as a data structure.
14+
But symbols but are given meaning by a complex web of interactions among the Lisp reader,
15+
namespaces, the Clojure compiler, and the Clojure runtime.
1216

13-
I remember trying to figure out symbols when I first learned Lisp.
14-
My predecessor languages (Fortran, Basic, Pascal) had not prepared me.
15-
(You might guess from that list that my first encounter was some years ago.)
16-
I was in good shape with symbols across multiple dialects of Lisp over the years,
17-
though certainly there was non-trivial variation.
17+
My recent work on ClojureCLR.Next has been on namespaces, the reader, and the parsing phase of the compiler.
18+
I'll write more about the parser later. It's been a challenge because the original (JVM/CLR) Clojure code
19+
mashes together a lot of syntactic and semantic analysis. I've been trying to separate these phases for multiple reasons:
1820

19-
Clojure forced yet another re-calibration.
20-
Symbols are a simple construct, but are given meaning by a complex web of interactions
21-
among the Lisp reader, namespaces, the Clojure compiler and the Clojure runtime.
22-
I hope to document here where meaning arises.
21+
- to make the code more modular and easier to understand
22+
- to enhance writing tests, for example, separate tests for parsing versus type analysis
23+
- to make debugging easier, for example, by having a simple AST versus a typed AST versus inspectable IL code
24+
- to get rid of circular references so that the code can be split into multiple files. (The Compiler.java file has 8,500 source lines of code and pretty much no comments. It is _dense_.)
2325

26+
One source of complexity in parsing is dealing with the interpretation of symbols.
27+
(I'm trying to avoid so many jokes here.)
28+
Nothing like implementation to make one appreciate how complicated this can be.
29+
If you just write code, you likely don't think about it much -- you know what works.
30+
Writing the code to make it work is another matter.
31+
32+
The code for resolving symbols and translating them in context into nodes in the abstract syntax tree (AST) is complex.
33+
There are appear to be some reduncancies that could be eliminated, along with a few other simplifications.
34+
But for that, I needed more clarity on the rules for symbol interpretation. What follows is not complete, by any means, but it is a starting point.
2435

2536

2637
## Background
@@ -38,8 +49,6 @@ Apparently that is not enough for some.
3849
- [What are symbols in Clojure?](https://www.reddit.com/r/Clojure/comments/j3b5hc/what_are_symbols_in_clojure/?rdt=63497)
3950
- [Explain Clojure Symbols](https://stackoverflow.com/questions/1175920/explain-clojure-symbols)
4051

41-
The first article is especially releveant.
42-
4352
So much for preparation.
4453

4554
# Naked symbolism
@@ -115,8 +124,8 @@ The reader looks at the next character in the input and decides what to do.
115124
but also `"` (read a string) and `\` (read a character) -- then call the special reader for that thing.
116125
- otherwise, we have a _token_.
117126

118-
For tokens, we accummulate characters until we hit the end of the input or a charact that can't be in a token.
119-
Characters that can't be in a token are whitespace or terminating macro character (that includes characters like `(` and `)`)`).
127+
For tokens, we accummulate characters until we hit the end of the input or a character that can't be in a token.
128+
Characters that can't be in a token are whitespace or terminating macro character (that includes characters like `(` and `)`).
120129
For the JVM version of the reader, that is entirely the definition of a token.
121130
On the CLR, we added `|`-escaping to make it possible to enter CLR typenames that have otherwise unacceptable (terminating) characters;
122131
this complicates token reading just a bit.
@@ -134,14 +143,14 @@ And that's it. Almost.
134143
Some of the specialized reader methods must go further and _interpret_ symbols that are encountered during their processing.
135144
One thinks of _interpretation_ typically as the domain of the evaluator/compiler, not the reader. But in the Clojure reader,
136145
it cannot be avoided. The Clojure(JVM) and ClojureCLR code for the reader makes this quite apparent;
137-
there are calls to methods defined over in the `Compiler` and `HostExpr` classes.
146+
there are calls to methods defined over in the classes defining the compiler.
138147
For ClojureCLR.Next, I wanted the reader to be defined before I got around to the compiler.
139148
In particular, because of F# circularity restrictions,
140-
I didn't want to have put the reader and at least the parser pass of the compiler into one massive file.
149+
I didn't want to have put the reader and (at least the parser pass of) the compiler into one massive file.
141150
I ended up duplicating the compiler methods used by the reader in the reader code itself.
142151
These duplicates could be simplified -- they don't have to deal with some compiler-specific issues such as local binding scopes.
143152

144-
Where does symbol interpretation arise in the reader? Primarily in [syntax quote](https://clojure.org/reference/reader#syntax-quote)
153+
Where does symbol interpretation arise in the reader? Primarily in [syntax quote](https://clojure.org/reference/reader#syntax-quote).
145154

146155
> For Symbols, syntax-quote _resolves_ the symbol in the current context, yielding a fully-qualified symbol (i.e. namespace/name or fully.qualified.Classname). If a symbol is non-namespace-qualified and ends with '#', it is resolved to a generated symbol with the same name to which '_' and a unique id have been appended. e.g. x# will resolve to x_123. All references to that symbol within a syntax-quoted expression resolve to the same generated symbol.
147156
@@ -160,7 +169,7 @@ mlnn/something ; => 7
160169
```
161170

162171
[NB: I discovered that the current version of ClojureCLR did not do the last line correctly. For the last 15 years.
163-
By the time you read this, the fix will be in.]
172+
By the time you read this, the fix will be in. So to speak.]
164173

165174
These operations require interpretation of symbols in the context of namespace aliases and type mappings.
166175
The first step on the road to interpretation begins with namespaces.
@@ -205,7 +214,6 @@ For our purposes here, we can ignore that. What _is_ important here is finding
205214
The entry points for that are the following.
206215

207216
```F#
208-
209217
// Get the value a symbol maps to. Typically a Var or a Type.
210218
member this.getMapping(sym: Symbol) = this.Mappings.valAt (sym)
211219
@@ -305,11 +313,9 @@ I wrote a debug printer for ASTs. Here is the output of parsing the form above.
305313

306314
```Clojure
307315
Fn ns1$fn__1
308-
invoke [ x ]
309-
316+
invoke [ x ]
310317
Let [ y
311-
= 7 (PrimNumeric) ]
312-
318+
= 7 (PrimNumeric) ]
313319
Invoke:
314320
Var: #'ns1/f
315321
Invoke:
@@ -355,8 +361,8 @@ Several kinds of AST nodes can be created from symbols. The details of node typ
355361
but perhaps you can get the gist:
356362

357363
- ns/name, ns names a `Type`, that type has a field or property with the given name => InteropCall, type = FieldOrProperty, static
358-
- ns/name ns names a `Type`, no field or property found, name does not start with a period => QualifiedMethod, Static
359-
- ns/.name ns names a `Type`, no field or property found, name starts with a period => QualifiedMethod, Instance
364+
- ns/name, ns names a `Type`, no field or property found, name does not start with a period => QualifiedMethod, Static
365+
- ns/.name, ns names a `Type`, no field or property found, name starts with a period => QualifiedMethod, Instance
360366
- ^NotAType TypeName/FieldName, FieldName not in type TypeName => throws because the tag is not a type
361367
- ^IsAType TypeName/FieldName, FieldName not in type TypeName => QualifiedMethod, Static, IsAType set as tag.
362368
- ^[...types...] TypeName/FieldName, FieldName not in type TypeName => QualifiedMethod, Static, SignatureHint set
@@ -365,7 +371,7 @@ Without a namespace:
365371

366372
- name - has a local binding => Expr.LocalBinding
367373
- not local, not a type, resolves to a Var, Var is macro => throws
368-
- not local, not a type, resolves to a Var, Var is has `:sonst true` metadata => Expr.Literal with Var as value
374+
- not local, not a type, resolves to a Var, Var is has `:const true` metadata => Expr.Literal with Var as value
369375
- not local, not a type, resolves to a Var, Var is not macro, not const => Expr.Var
370376
- not local, not a type, does not resolve, allow-unresolved = true => Expr.UnresolvedVar
371377
- not local, not a type, does not resolve, allow-unresolved = false => throws

0 commit comments

Comments
 (0)
Please sign in to comment.