[Spelunker] Remove input entities; improve retry logic; better loggin…

…g; and more (#681) This also stops turning type (alias) definitions in their own chunk -- there are too many and they are too small.
microsoft · Feb 7, 2025 · c48267b · c48267b
1 parent 53720e7
commit c48267b
Show file tree

Hide file tree

Showing 10 changed files with 255 additions and 264 deletions.
diff --git a/ts/packages/agents/spelunker/design.md b/ts/packages/agents/spelunker/design.md
@@ -6,28 +6,32 @@ The Spelunker context contains a list of folders considered the "focus".
 This is set by the "setFocusToFolders" action, which takes a list of folder names,
 normalizes them, and replaces the previous focus with this list.
 Normalization first replaces leading `~` with `$HOME`, then makes paths absolute.
+Entries that are not directories are skipped.
 The list may be cleared.
 Focus persists in a session (i.e., is preserved when cli or shell is killed and restarted).
 
 # Generating answers
 
 Questions about the focused code base are answered roughly as follows:
 
-1. Gather all relevant files. (E.g. `**/*.py`)
-2. Chunkify locally (using chunker.py)
-3. Send chunks for each file separately, in parallel, to a cheap, fast LLM
+1. Gather all relevant source files. (E.g. `**/*.{py,ts}`)
+2. Chunkify locally (using chunker.py or typescriptChunker.ts)
+3. Send batches of chunks, in parallel, to a cheap, fast LLM
    with a prompt asking it to find chunks relevant to the user question.
 4. Sort by relevance, keep top `N`. (E.g. `N = 30`)
-5. Send the selected chunks as context to a smart model
+5. Send the selected chunks as context to a smart model (the "oracle")
    with the request to answer the user question using those chunks as context.
 6. Construct a result from the answer and the chunks used to come up with it.
 
-## TO DO
+## How easy is it to target other languages?
+
+- Need a chunker for each language; the rest is the same.
+- Chunking TypeScript was, realistically, a week's work.
 
-- Try to cache chunks we've encountered.
-- Prompt engineering (burrow from John Lam?)
-- Ranking chunks (does the model do a good enough job?)
-- Do we need a "global index" like John Lam's ask.py?
+## TO DO
 
-- How easy is it to target other languages?
-  - Need a chunker for each language; the rest is the same.
+- Prompt engineering (borrow from John Lam?)
+- Evaluation of selection process (does the model do a good enough job?)
+- Scaling. It takes 60-80 seconds to select from ~4000 chunks.
+- Do we need a "global index" (of summaries) like John Lam's ask.py?
+  How to make that scale?
diff --git a/ts/packages/agents/spelunker/package.json b/ts/packages/agents/spelunker/package.json
@@ -28,6 +28,7 @@
     "aiclient": "workspace:*",
     "better-sqlite3": "11.8.1",
     "code-processor": "workspace:*",
+    "common-utils": "workspace:*",
     "typeagent": "workspace:*",
     "typechat": "^0.1.1",
     "typescript": "^5.4.2"

diff --git a/ts/packages/agents/spelunker/src/chunkSchema.ts b/ts/packages/agents/spelunker/src/chunkSchema.ts
@@ -8,7 +8,7 @@ export type ChunkId = string;
 export interface Blob {
     start: number; // int; 0-based!
     lines: string[];
-    breadcrumb?: boolean;
+    breadcrumb?: ChunkId | undefined;
 }
 
 export interface Chunk {

diff --git a/ts/packages/agents/spelunker/src/chunker.py b/ts/packages/agents/spelunker/src/chunker.py
@@ -31,27 +31,27 @@
 from typing import Any, Iterator
 
 
+IdType = str
+
+
 @dataclass
 class Blob:
     """A sequence of text lines plus some metadata."""
 
     start: int  # 0-based!
     lines: list[str]
-    breadcrumb: bool = False  # True to ignore on reconstruction
+    breadcrumb: IdType | None = None  # Chunk id if breadcrumb
 
     def to_dict(self) -> dict[str, object]:
         result: dict[str, Any] = {
             "start": self.start,
             "lines": self.lines,
         }
         if self.breadcrumb:
-            result["breadcrumb"] = True
+            result["breadcrumb"] = self.breadcrumb
         return result
 
 
-IdType = str
-
-
 @dataclass
 class Chunk:
     """A chunk at any level of nesting (root, inner, leaf)."""
@@ -240,6 +240,7 @@ def create_chunks_recursively(
             parent.children.append(node_id)
 
             # Split last parent.blobs[-1] into two, leaving a gap for the new Chunk
+            # and put a breadcrumb blob in between.
             parent_blob: Blob = parent.blobs.pop()
             parent_start: int = parent_blob.start
             parent_end: int = parent_blob.start + len(parent_blob.lines)
@@ -250,7 +251,7 @@ def create_chunks_recursively(
                 parent.blobs.append(Blob(parent_start, lines[parent_start:first_start]))
                 summary = summarize_chunk(chunk, node)
                 if summary:
-                    parent.blobs.append(Blob(first_start, summary, breadcrumb=True))
+                    parent.blobs.append(Blob(first_start, summary, breadcrumb=node_id))
                 parent.blobs.append(Blob(last_end, lines[last_end:parent_end]))
 
     return chunks