You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: _drafts/2025-02-14-symbolic-of-what.md
+31-25
Original file line number
Diff line number
Diff line change
@@ -9,18 +9,29 @@ An exercise in symbology.
9
9
10
10
11
11
## Introduction
12
+
13
+
Symbols in Clojure are a simple construct, viewed purely as a data structure.
14
+
But symbols but are given meaning by a complex web of interactions among the Lisp reader,
15
+
namespaces, the Clojure compiler, and the Clojure runtime.
12
16
13
-
I remember trying to figure out symbols when I first learned Lisp.
14
-
My predecessor languages (Fortran, Basic, Pascal) had not prepared me.
15
-
(You might guess from that list that my first encounter was some years ago.)
16
-
I was in good shape with symbols across multiple dialects of Lisp over the years,
17
-
though certainly there was non-trivial variation.
17
+
My recent work on ClojureCLR.Next has been on namespaces, the reader, and the parsing phase of the compiler.
18
+
I'll write more about the parser later. It's been a challenge because the original (JVM/CLR) Clojure code
19
+
mashes together a lot of syntactic and semantic analysis. I've been trying to separate these phases for multiple reasons:
18
20
19
-
Clojure forced yet another re-calibration.
20
-
Symbols are a simple construct, but are given meaning by a complex web of interactions
21
-
among the Lisp reader, namespaces, the Clojure compiler and the Clojure runtime.
22
-
I hope to document here where meaning arises.
21
+
- to make the code more modular and easier to understand
22
+
- to enhance writing tests, for example, separate tests for parsing versus type analysis
23
+
- to make debugging easier, for example, by having a simple AST versus a typed AST versus inspectable IL code
24
+
-to get rid of circular references so that the code can be split into multiple files. (The Compiler.java file has 8,500 source lines of code and pretty much no comments. It is _dense_.)
23
25
26
+
One source of complexity in parsing is dealing with the interpretation of symbols.
27
+
(I'm trying to avoid so many jokes here.)
28
+
Nothing like implementation to make one appreciate how complicated this can be.
29
+
If you just write code, you likely don't think about it much -- you know what works.
30
+
Writing the code to make it work is another matter.
31
+
32
+
The code for resolving symbols and translating them in context into nodes in the abstract syntax tree (AST) is complex.
33
+
There are appear to be some reduncancies that could be eliminated, along with a few other simplifications.
34
+
But for that, I needed more clarity on the rules for symbol interpretation. What follows is not complete, by any means, but it is a starting point.
24
35
25
36
26
37
## Background
@@ -38,8 +49,6 @@ Apparently that is not enough for some.
38
49
-[What are symbols in Clojure?](https://www.reddit.com/r/Clojure/comments/j3b5hc/what_are_symbols_in_clojure/?rdt=63497)
@@ -115,8 +124,8 @@ The reader looks at the next character in the input and decides what to do.
115
124
but also `"` (read a string) and `\` (read a character) -- then call the special reader for that thing.
116
125
- otherwise, we have a _token_.
117
126
118
-
For tokens, we accummulate characters until we hit the end of the input or a charact that can't be in a token.
119
-
Characters that can't be in a token are whitespace or terminating macro character (that includes characters like `(` and `)`)`).
127
+
For tokens, we accummulate characters until we hit the end of the input or a character that can't be in a token.
128
+
Characters that can't be in a token are whitespace or terminating macro character (that includes characters like `(` and `)`).
120
129
For the JVM version of the reader, that is entirely the definition of a token.
121
130
On the CLR, we added `|`-escaping to make it possible to enter CLR typenames that have otherwise unacceptable (terminating) characters;
122
131
this complicates token reading just a bit.
@@ -134,14 +143,14 @@ And that's it. Almost.
134
143
Some of the specialized reader methods must go further and _interpret_ symbols that are encountered during their processing.
135
144
One thinks of _interpretation_ typically as the domain of the evaluator/compiler, not the reader. But in the Clojure reader,
136
145
it cannot be avoided. The Clojure(JVM) and ClojureCLR code for the reader makes this quite apparent;
137
-
there are calls to methods defined over in the `Compiler` and `HostExpr` classes.
146
+
there are calls to methods defined over in the classes defining the compiler.
138
147
For ClojureCLR.Next, I wanted the reader to be defined before I got around to the compiler.
139
148
In particular, because of F# circularity restrictions,
140
-
I didn't want to have put the reader and at least the parser pass of the compiler into one massive file.
149
+
I didn't want to have put the reader and (at least the parser pass of) the compiler into one massive file.
141
150
I ended up duplicating the compiler methods used by the reader in the reader code itself.
142
151
These duplicates could be simplified -- they don't have to deal with some compiler-specific issues such as local binding scopes.
143
152
144
-
Where does symbol interpretation arise in the reader? Primarily in [syntax quote](https://clojure.org/reference/reader#syntax-quote)
153
+
Where does symbol interpretation arise in the reader? Primarily in [syntax quote](https://clojure.org/reference/reader#syntax-quote).
145
154
146
155
> For Symbols, syntax-quote _resolves_ the symbol in the current context, yielding a fully-qualified symbol (i.e. namespace/name or fully.qualified.Classname). If a symbol is non-namespace-qualified and ends with '#', it is resolved to a generated symbol with the same name to which '_' and a unique id have been appended. e.g. x# will resolve to x_123. All references to that symbol within a syntax-quoted expression resolve to the same generated symbol.
147
156
@@ -160,7 +169,7 @@ mlnn/something ; => 7
160
169
```
161
170
162
171
[NB: I discovered that the current version of ClojureCLR did not do the last line correctly. For the last 15 years.
163
-
By the time you read this, the fix will be in.]
172
+
By the time you read this, the fix will be in. So to speak.]
164
173
165
174
These operations require interpretation of symbols in the context of namespace aliases and type mappings.
166
175
The first step on the road to interpretation begins with namespaces.
@@ -205,7 +214,6 @@ For our purposes here, we can ignore that. What _is_ important here is finding
205
214
The entry points for that are the following.
206
215
207
216
```F#
208
-
209
217
// Get the value a symbol maps to. Typically a Var or a Type.
210
218
member this.getMapping(sym: Symbol) = this.Mappings.valAt (sym)
211
219
@@ -305,11 +313,9 @@ I wrote a debug printer for ASTs. Here is the output of parsing the form above.
305
313
306
314
```Clojure
307
315
Fn ns1$fn__1
308
-
invoke [ x ]
309
-
316
+
invoke [ x ]
310
317
Let [ y
311
-
= 7 (PrimNumeric) ]
312
-
318
+
= 7 (PrimNumeric) ]
313
319
Invoke:
314
320
Var: #'ns1/f
315
321
Invoke:
@@ -355,8 +361,8 @@ Several kinds of AST nodes can be created from symbols. The details of node typ
355
361
but perhaps you can get the gist:
356
362
357
363
- ns/name, ns names a `Type`, that type has a field or property with the given name => InteropCall, type = FieldOrProperty, static
358
-
- ns/name ns names a `Type`, no field or property found, name does not start with a period => QualifiedMethod, Static
359
-
- ns/.name ns names a `Type`, no field or property found, name starts with a period => QualifiedMethod, Instance
364
+
- ns/name, ns names a `Type`, no field or property found, name does not start with a period => QualifiedMethod, Static
365
+
- ns/.name, ns names a `Type`, no field or property found, name starts with a period => QualifiedMethod, Instance
360
366
- ^NotAType TypeName/FieldName, FieldName not in type TypeName => throws because the tag is not a type
361
367
- ^IsAType TypeName/FieldName, FieldName not in type TypeName => QualifiedMethod, Static, IsAType set as tag.
362
368
- ^[...types...] TypeName/FieldName, FieldName not in type TypeName => QualifiedMethod, Static, SignatureHint set
@@ -365,7 +371,7 @@ Without a namespace:
365
371
366
372
- name - has a local binding => Expr.LocalBinding
367
373
- not local, not a type, resolves to a Var, Var is macro => throws
368
-
- not local, not a type, resolves to a Var, Var is has `:sonst true` metadata => Expr.Literal with Var as value
374
+
- not local, not a type, resolves to a Var, Var is has `:const true` metadata => Expr.Literal with Var as value
369
375
- not local, not a type, resolves to a Var, Var is not macro, not const => Expr.Var
370
376
- not local, not a type, does not resolve, allow-unresolved = true => Expr.UnresolvedVar
371
377
- not local, not a type, does not resolve, allow-unresolved = false => throws
0 commit comments