This document details the Retrieval-Augmented Generation (RAG) process implemented in the C# scripts within the LLM/Editor/Core directory. The system leverages past interactions to provide relevant context to the language model. This is achieved by generating embeddings for user prompts, retrieving similar past interactions from a memory log, and augmenting the context for the next LLM call.
CommandExecutor.cs: Orchestrates the execution of commands, including triggering the embedding and memory logging processes.EmbeddingHelper.cs: Handles the generation of vector embeddings for text prompts by calling the Google Vertex AI API.MemoryRetriever.cs: Retrieves relevant past interactions (memories) by comparing the new prompt's embedding with stored embeddings.MemoryLogger.cs: Logs new, successful interactions, including their embeddings, to a persistent JSON file.SessionManager.cs: Manages session data, including file paths for logs and caches.The RAG process is integrated into the command execution lifecycle.
CommandExecutor.ExecuteNextCommand method.CommandExecutor calls EmbeddingHelper.GetEmbedding to generate a vector embedding for the user's prompt (_currentUserPrompt).The EmbeddingHelper.GetEmbedding method makes a REST API call to the Google Cloud Vertex AI platform.
Endpoint URL:
https://{region}-aiplatform.googleapis.com/v1/projects/{gcpProjectId}/locations/{gcpRegion}/publishers/google/models/{embeddingModelName}:predict
region, gcpProjectId, and embeddingModelName are loaded from MCPSettings.HTTP Method: POST
Headers:
Content-Type: application/jsonAuthorization: Bearer {authToken} (The auth token is retrieved via apiClient.GetAuthToken).Request Body Payload: The user's prompt text is sent within a JSON structure.
{
"instances": [
{
"content": "The user's prompt text goes here."
}
]
}
Response Body Payload: The API returns the embedding as a list of floating-point values.
{
"predictions": [
{
"embeddings": {
"values": [0.01, -0.02, ..., 0.03]
}
}
]
}
The values array is extracted and returned as a float[].
Before sending the final prompt to the generative LLM, the system retrieves relevant context from past interactions.
MemoryRetriever.GetRelevantMemories is called.EmbeddingHelper.GetEmbedding flow described in Step 2.InteractionRecord objects from the memory_log.json file using MemoryLogger.GetRecords(). This log is stored in the session cache path (e.g., Application.persistentDataPath/LLMCache/{session_id}/memory_log.json).PromptEmbedding using the Cosine Similarity algorithm.topK (default is 3) InteractionRecords with the highest similarity scores.The list of relevant InteractionRecords retrieved in Step 3 is then used to augment the prompt sent to the main generative LLM. This provides the LLM with examples of similar, successful past interactions, improving the quality and relevance of its response. (The exact formatting of this augmented prompt is handled by the calling client).
After a command sequence is successfully executed, the interaction is logged to ensure it can be retrieved in the future.
CommandExecutor creates an InteractionRecord instance. This record contains:
UserPrompt: The original user prompt.LLMResponse: The CommandData returned by the LLM.Outcome: The final status (e.g., CommandOutcome.Success).PromptEmbedding: The embedding vector generated in Step 1.MemoryLogger.AddRecord is called with this new record.memory_log.json file by SaveMemoryLog(). This makes the new interaction available for future memory retrieval operations.This cyclical process ensures that the system continuously learns from user interactions to provide more accurate and context-aware assistance.
CommandExecutor in the RAG LifecycleThe CommandExecutor is the central orchestrator for the RAG process, tying together the generation of embeddings and the logging of memories. Its involvement ensures that every user interaction has the potential to be a future piece of retrievable context.
ExecuteNextCommandThe RAG-related logic is primarily contained within the ExecuteNextCommand method.
Pre-Execution: Embedding Generation
ExecuteNextCommand is called, and before the command logic itself runs, it prepares to create an embedding for the current user prompt (_currentUserPrompt).await EmbeddingHelper.GetEmbedding(...).float[] promptEmbedding is stored in a local variable. This is a crucial step: the semantic meaning of the user's request is captured and vectorized before any action is taken. This ensures the embedding represents the user's raw intent.Command Execution
CreateFileCommand, ModifyCodeCommand) is instantiated and its Execute() method is called. The outcome (Success, Error, etc.) is determined.Post-Execution: Memory Logging (in the finally block)
finally block guarantees that the interaction is logged regardless of the command's outcome.InteractionRecord is created.UserPrompt: The original text from the user.LLMResponse: The CommandData that was executed.Outcome: The result of the command execution.Feedback: Any error message or a success indicator.isMultiStep: A flag indicating if it was part of a larger plan.PromptEmbedding: The embedding vector generated in step 1 is attached to the record.InteractionRecord is then passed to MemoryLogger.AddRecord().MemoryLogger saves this record to the memory_log.json file, making it a permanent "memory" that can be retrieved by the MemoryRetriever in subsequent user prompts.By handling both the initial embedding and the final logging, CommandExecutor acts as the bookends for the RAG lifecycle of a single interaction. It captures the user's intent as a vector and then archives the entire interaction, along with that vector, for future use.