# RAG Functionality in LLM/Editor/Core This document details the Retrieval-Augmented Generation (RAG) process implemented in the C# scripts within the `LLM/Editor/Core` directory. The system leverages past interactions to provide relevant context to the language model. This is achieved by generating embeddings for user prompts, retrieving similar past interactions from a memory log, and augmenting the context for the next LLM call. ## Core Components - **`CommandExecutor.cs`**: Orchestrates the execution of commands, including triggering the embedding and memory logging processes. - **`EmbeddingHelper.cs`**: Handles the generation of vector embeddings for text prompts by calling the Google Vertex AI API. - **`MemoryRetriever.cs`**: Retrieves relevant past interactions (memories) by comparing the new prompt's embedding with stored embeddings. - **`MemoryLogger.cs`**: Logs new, successful interactions, including their embeddings, to a persistent JSON file. - **`SessionManager.cs`**: Manages session data, including file paths for logs and caches. ## Step-by-Step RAG Flow The RAG process is integrated into the command execution lifecycle. ### 1. User Prompt and Initial Embedding 1. **Trigger**: A user submits a prompt, which initiates the `CommandExecutor.ExecuteNextCommand` method. 2. **Embedding Generation**: Before executing the command, the `CommandExecutor` calls `EmbeddingHelper.GetEmbedding` to generate a vector embedding for the user's prompt (`_currentUserPrompt`). ### 2. API Hit: Google Vertex AI for Embeddings The `EmbeddingHelper.GetEmbedding` method makes a REST API call to the Google Cloud Vertex AI platform. - **Endpoint URL**: ``` https://{region}-aiplatform.googleapis.com/v1/projects/{gcpProjectId}/locations/{gcpRegion}/publishers/google/models/{embeddingModelName}:predict ``` - The `region`, `gcpProjectId`, and `embeddingModelName` are loaded from `MCPSettings`. - **HTTP Method**: `POST` - **Headers**: - `Content-Type`: `application/json` - `Authorization`: `Bearer {authToken}` (The auth token is retrieved via `apiClient.GetAuthToken`). - **Request Body Payload**: The user's prompt text is sent within a JSON structure. ```json { "instances": [ { "content": "The user's prompt text goes here." } ] } ``` - **Response Body Payload**: The API returns the embedding as a list of floating-point values. ```json { "predictions": [ { "embeddings": { "values": [0.01, -0.02, ..., 0.03] } } ] } ``` The `values` array is extracted and returned as a `float[]`. ### 3. Memory Retrieval (The "R" in RAG) Before sending the final prompt to the generative LLM, the system retrieves relevant context from past interactions. 1. **`MemoryRetriever.GetRelevantMemories`** is called. 2. It first generates an embedding for the new user prompt using the exact same `EmbeddingHelper.GetEmbedding` flow described in Step 2. 3. It loads all historical `InteractionRecord` objects from the `memory_log.json` file using `MemoryLogger.GetRecords()`. This log is stored in the session cache path (e.g., `Application.persistentDataPath/LLMCache/{session_id}/memory_log.json`). 4. For each stored record, it compares the new prompt's embedding to the record's `PromptEmbedding` using the **Cosine Similarity** algorithm. 5. The method returns a list of the `topK` (default is 3) `InteractionRecord`s with the highest similarity scores. ### 4. Context Augmentation (The "A" in RAG) The list of relevant `InteractionRecord`s retrieved in Step 3 is then used to augment the prompt sent to the main generative LLM. This provides the LLM with examples of similar, successful past interactions, improving the quality and relevance of its response. (The exact formatting of this augmented prompt is handled by the calling client). ### 5. Logging for Future Retrieval After a command sequence is successfully executed, the interaction is logged to ensure it can be retrieved in the future. 1. **`CommandExecutor`** creates an `InteractionRecord` instance. This record contains: - `UserPrompt`: The original user prompt. - `LLMResponse`: The `CommandData` returned by the LLM. - `Outcome`: The final status (e.g., `CommandOutcome.Success`). - `PromptEmbedding`: The embedding vector generated in Step 1. 2. **`MemoryLogger.AddRecord`** is called with this new record. 3. The record is added to an in-memory list and then persisted to the `memory_log.json` file by `SaveMemoryLog()`. This makes the new interaction available for future memory retrieval operations. This cyclical process ensures that the system continuously learns from user interactions to provide more accurate and context-aware assistance. ## The Role of `CommandExecutor` in the RAG Lifecycle The `CommandExecutor` is the central orchestrator for the RAG process, tying together the generation of embeddings and the logging of memories. Its involvement ensures that every user interaction has the potential to be a future piece of retrievable context. ### Detailed Flow within `ExecuteNextCommand` The RAG-related logic is primarily contained within the `ExecuteNextCommand` method. 1. **Pre-Execution: Embedding Generation** - As soon as `ExecuteNextCommand` is called, and before the command logic itself runs, it prepares to create an embedding for the current user prompt (`_currentUserPrompt`). - It retrieves the API client and calls `await EmbeddingHelper.GetEmbedding(...)`. - The resulting `float[] promptEmbedding` is stored in a local variable. This is a crucial step: the semantic meaning of the user's request is captured and vectorized *before* any action is taken. This ensures the embedding represents the user's raw intent. 2. **Command Execution** - The relevant command (e.g., `CreateFileCommand`, `ModifyCodeCommand`) is instantiated and its `Execute()` method is called. The outcome (`Success`, `Error`, etc.) is determined. 3. **Post-Execution: Memory Logging (in the `finally` block)** - The `finally` block guarantees that the interaction is logged regardless of the command's outcome. - A new `InteractionRecord` is created. - This record is populated with the full context of the event: - `UserPrompt`: The original text from the user. - `LLMResponse`: The `CommandData` that was executed. - `Outcome`: The result of the command execution. - `Feedback`: Any error message or a success indicator. - `isMultiStep`: A flag indicating if it was part of a larger plan. - `PromptEmbedding`: **The embedding vector generated in step 1 is attached to the record.** - This complete `InteractionRecord` is then passed to `MemoryLogger.AddRecord()`. - `MemoryLogger` saves this record to the `memory_log.json` file, making it a permanent "memory" that can be retrieved by the `MemoryRetriever` in subsequent user prompts. By handling both the initial embedding and the final logging, `CommandExecutor` acts as the bookends for the RAG lifecycle of a single interaction. It captures the user's intent as a vector and then archives the entire interaction, along with that vector, for future use.