RAG_Functionality.md 7.2 KB

RAG Functionality in LLM/Editor/Core

This document details the Retrieval-Augmented Generation (RAG) process implemented in the C# scripts within the LLM/Editor/Core directory. The system leverages past interactions to provide relevant context to the language model. This is achieved by generating embeddings for user prompts, retrieving similar past interactions from a memory log, and augmenting the context for the next LLM call.

Core Components

  • CommandExecutor.cs: Orchestrates the execution of commands, including triggering the embedding and memory logging processes.
  • EmbeddingHelper.cs: Handles the generation of vector embeddings for text prompts by calling the Google Vertex AI API.
  • MemoryRetriever.cs: Retrieves relevant past interactions (memories) by comparing the new prompt's embedding with stored embeddings.
  • MemoryLogger.cs: Logs new, successful interactions, including their embeddings, to a persistent JSON file.
  • SessionManager.cs: Manages session data, including file paths for logs and caches.

Step-by-Step RAG Flow

The RAG process is integrated into the command execution lifecycle.

1. User Prompt and Initial Embedding

  1. Trigger: A user submits a prompt, which initiates the CommandExecutor.ExecuteNextCommand method.
  2. Embedding Generation: Before executing the command, the CommandExecutor calls EmbeddingHelper.GetEmbedding to generate a vector embedding for the user's prompt (_currentUserPrompt).

2. API Hit: Google Vertex AI for Embeddings

The EmbeddingHelper.GetEmbedding method makes a REST API call to the Google Cloud Vertex AI platform.

  • Endpoint URL:

    https://{region}-aiplatform.googleapis.com/v1/projects/{gcpProjectId}/locations/{gcpRegion}/publishers/google/models/{embeddingModelName}:predict
    
    • The region, gcpProjectId, and embeddingModelName are loaded from MCPSettings.
  • HTTP Method: POST

  • Headers:

    • Content-Type: application/json
    • Authorization: Bearer {authToken} (The auth token is retrieved via apiClient.GetAuthToken).
  • Request Body Payload: The user's prompt text is sent within a JSON structure.

    {
      "instances": [
        {
          "content": "The user's prompt text goes here."
        }
      ]
    }
    
  • Response Body Payload: The API returns the embedding as a list of floating-point values.

    {
      "predictions": [
        {
          "embeddings": {
            "values": [0.01, -0.02, ..., 0.03]
          }
        }
      ]
    }
    

    The values array is extracted and returned as a float[].

3. Memory Retrieval (The "R" in RAG)

Before sending the final prompt to the generative LLM, the system retrieves relevant context from past interactions.

  1. MemoryRetriever.GetRelevantMemories is called.
  2. It first generates an embedding for the new user prompt using the exact same EmbeddingHelper.GetEmbedding flow described in Step 2.
  3. It loads all historical InteractionRecord objects from the memory_log.json file using MemoryLogger.GetRecords(). This log is stored in the session cache path (e.g., Application.persistentDataPath/LLMCache/{session_id}/memory_log.json).
  4. For each stored record, it compares the new prompt's embedding to the record's PromptEmbedding using the Cosine Similarity algorithm.
  5. The method returns a list of the topK (default is 3) InteractionRecords with the highest similarity scores.

4. Context Augmentation (The "A" in RAG)

The list of relevant InteractionRecords retrieved in Step 3 is then used to augment the prompt sent to the main generative LLM. This provides the LLM with examples of similar, successful past interactions, improving the quality and relevance of its response. (The exact formatting of this augmented prompt is handled by the calling client).

5. Logging for Future Retrieval

After a command sequence is successfully executed, the interaction is logged to ensure it can be retrieved in the future.

  1. CommandExecutor creates an InteractionRecord instance. This record contains:
    • UserPrompt: The original user prompt.
    • LLMResponse: The CommandData returned by the LLM.
    • Outcome: The final status (e.g., CommandOutcome.Success).
    • PromptEmbedding: The embedding vector generated in Step 1.
  2. MemoryLogger.AddRecord is called with this new record.
  3. The record is added to an in-memory list and then persisted to the memory_log.json file by SaveMemoryLog(). This makes the new interaction available for future memory retrieval operations.

This cyclical process ensures that the system continuously learns from user interactions to provide more accurate and context-aware assistance.

The Role of CommandExecutor in the RAG Lifecycle

The CommandExecutor is the central orchestrator for the RAG process, tying together the generation of embeddings and the logging of memories. Its involvement ensures that every user interaction has the potential to be a future piece of retrievable context.

Detailed Flow within ExecuteNextCommand

The RAG-related logic is primarily contained within the ExecuteNextCommand method.

  1. Pre-Execution: Embedding Generation

    • As soon as ExecuteNextCommand is called, and before the command logic itself runs, it prepares to create an embedding for the current user prompt (_currentUserPrompt).
    • It retrieves the API client and calls await EmbeddingHelper.GetEmbedding(...).
    • The resulting float[] promptEmbedding is stored in a local variable. This is a crucial step: the semantic meaning of the user's request is captured and vectorized before any action is taken. This ensures the embedding represents the user's raw intent.
  2. Command Execution

    • The relevant command (e.g., CreateFileCommand, ModifyCodeCommand) is instantiated and its Execute() method is called. The outcome (Success, Error, etc.) is determined.
  3. Post-Execution: Memory Logging (in the finally block)

    • The finally block guarantees that the interaction is logged regardless of the command's outcome.
    • A new InteractionRecord is created.
    • This record is populated with the full context of the event:
      • UserPrompt: The original text from the user.
      • LLMResponse: The CommandData that was executed.
      • Outcome: The result of the command execution.
      • Feedback: Any error message or a success indicator.
      • isMultiStep: A flag indicating if it was part of a larger plan.
      • PromptEmbedding: The embedding vector generated in step 1 is attached to the record.
    • This complete InteractionRecord is then passed to MemoryLogger.AddRecord().
    • MemoryLogger saves this record to the memory_log.json file, making it a permanent "memory" that can be retrieved by the MemoryRetriever in subsequent user prompts.

By handling both the initial embedding and the final logging, CommandExecutor acts as the bookends for the RAG lifecycle of a single interaction. It captures the user's intent as a vector and then archives the entire interaction, along with that vector, for future use.