This document details the Retrieval-Augmented Generation (RAG) process implemented in the C# scripts within the LLM/Editor/Core
directory. The system leverages past interactions to provide relevant context to the language model. This is achieved by generating embeddings for user prompts, retrieving similar past interactions from a memory log, and augmenting the context for the next LLM call.
CommandExecutor.cs
: Orchestrates the execution of commands, including triggering the embedding and memory logging processes.EmbeddingHelper.cs
: Handles the generation of vector embeddings for text prompts by calling the Google Vertex AI API.MemoryRetriever.cs
: Retrieves relevant past interactions (memories) by comparing the new prompt's embedding with stored embeddings.MemoryLogger.cs
: Logs new, successful interactions, including their embeddings, to a persistent JSON file.SessionManager.cs
: Manages session data, including file paths for logs and caches.The RAG process is integrated into the command execution lifecycle.
CommandExecutor.ExecuteNextCommand
method.CommandExecutor
calls EmbeddingHelper.GetEmbedding
to generate a vector embedding for the user's prompt (_currentUserPrompt
).The EmbeddingHelper.GetEmbedding
method makes a REST API call to the Google Cloud Vertex AI platform.
Endpoint URL:
https://{region}-aiplatform.googleapis.com/v1/projects/{gcpProjectId}/locations/{gcpRegion}/publishers/google/models/{embeddingModelName}:predict
region
, gcpProjectId
, and embeddingModelName
are loaded from MCPSettings
.HTTP Method: POST
Headers:
Content-Type
: application/json
Authorization
: Bearer {authToken}
(The auth token is retrieved via apiClient.GetAuthToken
).Request Body Payload: The user's prompt text is sent within a JSON structure.
{
"instances": [
{
"content": "The user's prompt text goes here."
}
]
}
Response Body Payload: The API returns the embedding as a list of floating-point values.
{
"predictions": [
{
"embeddings": {
"values": [0.01, -0.02, ..., 0.03]
}
}
]
}
The values
array is extracted and returned as a float[]
.
Before sending the final prompt to the generative LLM, the system retrieves relevant context from past interactions.
MemoryRetriever.GetRelevantMemories
is called.EmbeddingHelper.GetEmbedding
flow described in Step 2.InteractionRecord
objects from the memory_log.json
file using MemoryLogger.GetRecords()
. This log is stored in the session cache path (e.g., Application.persistentDataPath/LLMCache/{session_id}/memory_log.json
).PromptEmbedding
using the Cosine Similarity algorithm.topK
(default is 3) InteractionRecord
s with the highest similarity scores.The list of relevant InteractionRecord
s retrieved in Step 3 is then used to augment the prompt sent to the main generative LLM. This provides the LLM with examples of similar, successful past interactions, improving the quality and relevance of its response. (The exact formatting of this augmented prompt is handled by the calling client).
After a command sequence is successfully executed, the interaction is logged to ensure it can be retrieved in the future.
CommandExecutor
creates an InteractionRecord
instance. This record contains:
UserPrompt
: The original user prompt.LLMResponse
: The CommandData
returned by the LLM.Outcome
: The final status (e.g., CommandOutcome.Success
).PromptEmbedding
: The embedding vector generated in Step 1.MemoryLogger.AddRecord
is called with this new record.memory_log.json
file by SaveMemoryLog()
. This makes the new interaction available for future memory retrieval operations.This cyclical process ensures that the system continuously learns from user interactions to provide more accurate and context-aware assistance.
CommandExecutor
in the RAG LifecycleThe CommandExecutor
is the central orchestrator for the RAG process, tying together the generation of embeddings and the logging of memories. Its involvement ensures that every user interaction has the potential to be a future piece of retrievable context.
ExecuteNextCommand
The RAG-related logic is primarily contained within the ExecuteNextCommand
method.
Pre-Execution: Embedding Generation
ExecuteNextCommand
is called, and before the command logic itself runs, it prepares to create an embedding for the current user prompt (_currentUserPrompt
).await EmbeddingHelper.GetEmbedding(...)
.float[] promptEmbedding
is stored in a local variable. This is a crucial step: the semantic meaning of the user's request is captured and vectorized before any action is taken. This ensures the embedding represents the user's raw intent.Command Execution
CreateFileCommand
, ModifyCodeCommand
) is instantiated and its Execute()
method is called. The outcome (Success
, Error
, etc.) is determined.Post-Execution: Memory Logging (in the finally
block)
finally
block guarantees that the interaction is logged regardless of the command's outcome.InteractionRecord
is created.UserPrompt
: The original text from the user.LLMResponse
: The CommandData
that was executed.Outcome
: The result of the command execution.Feedback
: Any error message or a success indicator.isMultiStep
: A flag indicating if it was part of a larger plan.PromptEmbedding
: The embedding vector generated in step 1 is attached to the record.InteractionRecord
is then passed to MemoryLogger.AddRecord()
.MemoryLogger
saves this record to the memory_log.json
file, making it a permanent "memory" that can be retrieved by the MemoryRetriever
in subsequent user prompts.By handling both the initial embedding and the final logging, CommandExecutor
acts as the bookends for the RAG lifecycle of a single interaction. It captures the user's intent as a vector and then archives the entire interaction, along with that vector, for future use.