Browse Source

Sujith :) ->
1. Added logs

Sujith:) 21 hours ago
parent
commit
e3fa853059

+ 4 - 4
Assets/LLM/source/Documentation/1_HighLevel_Data_Guide.md

@@ -1,4 +1,4 @@
-# LLM Guide: High-Level Project Data (`HighLevel.zip`)
+# LLM Guide: High-Level Project Data (`HighLevel/`)
 
 
 ## 1. Purpose
 ## 1. Purpose
 
 
@@ -14,7 +14,7 @@ This data level provides a **high-level, bird's-eye view** of the entire Unity p
 
 
 ## 2. File Structure
 ## 2. File Structure
 
 
-The zip file will contain the following structure:
+The `HighLevel` folder will contain the following structure:
 
 
 ```
 ```
 HighLevel/
 HighLevel/
@@ -85,11 +85,11 @@ This file is a direct copy of the project's `Packages/manifest.json`. It lists a
 
 
 ## 4. Handling Shrunken / Tokenized Data
 ## 4. Handling Shrunken / Tokenized Data
 
 
-The JSON files in this zip may have been processed by `json_reducer.py` to save space. You can identify this if the root object contains a `key_mapper` and a `data` key.
+The JSON files in this folder may have been processed by `json_reducer.py` to save space. You can identify this if the root object contains a `key_mapper` and a `data` key.
 
 
 **If the data is shrunken:**
 **If the data is shrunken:**
 1.  The file content will look like this: `{"key_mapper":{"k0":"productName",...},"data":{"k0":"Terra-LLM",...}}`
 1.  The file content will look like this: `{"key_mapper":{"k0":"productName",...},"data":{"k0":"Terra-LLM",...}}`
 2.  You **MUST** use the `key_mapper` object to de-tokenize the keys in the `data` object before you can interpret the schema. The `key_mapper` is a dictionary where the keys are the short tokens (e.g., `k0`) and the values are the original, human-readable keys (e.g., `productName`).
 2.  You **MUST** use the `key_mapper` object to de-tokenize the keys in the `data` object before you can interpret the schema. The `key_mapper` is a dictionary where the keys are the short tokens (e.g., `k0`) and the values are the original, human-readable keys (e.g., `productName`).
 3.  Recursively replace every token in the `data` object with its original value from the `key_mapper`.
 3.  Recursively replace every token in the `data` object with its original value from the `key_mapper`.
 
 
-Once de-tokenized, the data will conform to the schemas described above.
+Once de-tokenized, the data will conform to the schemas described above.

+ 4 - 4
Assets/LLM/source/Documentation/2_MidLevel_Data_Guide.md

@@ -1,4 +1,4 @@
-# LLM Guide: Mid-Level Project Data (`MidLevel.zip`)
+# LLM Guide: Mid-Level Project Data (`MidLevel/`)
 
 
 ## 1. Purpose
 ## 1. Purpose
 
 
@@ -13,7 +13,7 @@ This data level provides a **structural and relational view** of the project's a
 
 
 ## 2. File Structure
 ## 2. File Structure
 
 
-The zip file will contain a replicated `Assets/` directory and a `GuidMappers/` directory.
+The `MidLevel` folder will contain a replicated `Assets/` directory and a `GuidMappers/` directory.
 
 
 ```
 ```
 MidLevel/
 MidLevel/
@@ -95,11 +95,11 @@ Each `.unity` and `.prefab` file is converted into a single JSON file that descr
 
 
 ## 4. Handling Shrunken / Tokenized Data
 ## 4. Handling Shrunken / Tokenized Data
 
 
-The JSON files in this zip may have been processed by `json_reducer.py`. You can identify this if the root object contains a `key_mapper` and a `data` key.
+The JSON files in this folder may have been processed by `json_reducer.py`. You can identify this if the root object contains a `key_mapper` and a `data` key.
 
 
 **If the data is shrunken:**
 **If the data is shrunken:**
 1.  The file content will look like this: `{"key_mapper":{"k0":"fileID",...},"data":[{"k0":"14854",...}]}`
 1.  The file content will look like this: `{"key_mapper":{"k0":"fileID",...},"data":[{"k0":"14854",...}]}`
 2.  You **MUST** use the `key_mapper` object to de-tokenize the keys in the `data` object before you can interpret the schema.
 2.  You **MUST** use the `key_mapper` object to de-tokenize the keys in the `data` object before you can interpret the schema.
 3.  The `json_reducer` also removes keys with empty values (e.g., an empty `children` array `[]` will be removed). If a key like `children` is missing from a de-tokenized object, you can assume its value was empty.
 3.  The `json_reducer` also removes keys with empty values (e.g., an empty `children` array `[]` will be removed). If a key like `children` is missing from a de-tokenized object, you can assume its value was empty.
 
 
-Once de-tokenized, the data will conform to the schemas described above.
+Once de-tokenized, the data will conform to the schemas described above.

+ 4 - 4
Assets/LLM/source/Documentation/3_LowLevel_Data_Guide.md

@@ -1,4 +1,4 @@
-# LLM Guide: Low-Level Project Data (`LowLevel.zip`)
+# LLM Guide: Low-Level Project Data (`LowLevel/`)
 
 
 ## 1. Purpose
 ## 1. Purpose
 
 
@@ -11,7 +11,7 @@ This data level provides the **most granular, detailed, and raw view** of the pr
 
 
 ## 2. File Structure
 ## 2. File Structure
 
 
-The zip file contains a highly detailed breakdown of the project, where scenes and prefabs are exploded into their constituent GameObjects.
+The `LowLevel` folder contains a highly detailed breakdown of the project, where scenes and prefabs are exploded into their constituent GameObjects.
 
 
 ```
 ```
 LowLevel/
 LowLevel/
@@ -79,11 +79,11 @@ This file contains a simple JSON list of the `fileID`s for all the root (top-lev
 
 
 ## 4. Handling Shrunken / Tokenized Data
 ## 4. Handling Shrunken / Tokenized Data
 
 
-The JSON files in this zip may have been processed by `json_reducer.py`.
+The JSON files in this folder may have been processed by `json_reducer.py`.
 
 
 **If the data is shrunken:**
 **If the data is shrunken:**
 1.  The file content will look like this: `{"key_mapper":{"k0":"GameObject",...},"data":{"k0":{...}}}`
 1.  The file content will look like this: `{"key_mapper":{"k0":"GameObject",...},"data":{"k0":{...}}}`
 2.  You **MUST** use the `key_mapper` to de-tokenize the keys in the `data` object before you can interpret the schema.
 2.  You **MUST** use the `key_mapper` to de-tokenize the keys in the `data` object before you can interpret the schema.
 3.  The `json_reducer` also removes keys with empty values. If a key is missing from a de-tokenized object, you can assume its value was empty.
 3.  The `json_reducer` also removes keys with empty values. If a key is missing from a de-tokenized object, you can assume its value was empty.
 
 
-Once de-tokenized, the data will conform to the schemas described above.
+Once de-tokenized, the data will conform to the schemas described above.

+ 5 - 5
Assets/LLM/source/Documentation/4_Query_Strategy_Guide.md

@@ -6,7 +6,7 @@ The project data is split into three levels of detail (High, Mid, Low) to enable
 
 
 **The workflow is always: High -> Mid -> Low.** Never start with the Low-level data unless you have a very specific `fileID` you need to look up.
 **The workflow is always: High -> Mid -> Low.** Never start with the Low-level data unless you have a very specific `fileID` you need to look up.
 
 
-## 2. The Triage Process: Which Zip to Use?
+## 2. The Triage Process: Which Folder to Use?
 
 
 When you receive a user query, use the following decision process to select the correct data source.
 When you receive a user query, use the following decision process to select the correct data source.
 
 
@@ -15,7 +15,7 @@ When you receive a user query, use the following decision process to select the
 First, check if the question is about the project's global configuration, dependencies, or a general summary.
 First, check if the question is about the project's global configuration, dependencies, or a general summary.
 
 
 - **Keywords:** "settings", "package", "version", "scenes in build", "tags", "layers", "render pipeline".
 - **Keywords:** "settings", "package", "version", "scenes in build", "tags", "layers", "render pipeline".
-- **Action:** Use **`HighLevel.zip`**.
+- **Action:** Use the **`HighLevel/`** folder.
 - **Example Queries:**
 - **Example Queries:**
     - "What is the name of the game?" -> Read `manifest.json`.
     - "What is the name of the game?" -> Read `manifest.json`.
     - "Which version of the Universal Render Pipeline is installed?" -> Read `packages.json`.
     - "Which version of the Universal Render Pipeline is installed?" -> Read `packages.json`.
@@ -26,7 +26,7 @@ First, check if the question is about the project's global configuration, depend
 If the question is about the arrangement of objects within a scene/prefab, what components an object has, or where assets are, the Mid-level data is the correct source.
 If the question is about the arrangement of objects within a scene/prefab, what components an object has, or where assets are, the Mid-level data is the correct source.
 
 
 - **Keywords:** "hierarchy", "list objects", "what components", "find all prefabs with", "children of", "location of".
 - **Keywords:** "hierarchy", "list objects", "what components", "find all prefabs with", "children of", "location of".
-- **Action:** Use **`MidLevel.zip`**.
+- **Action:** Use the **`MidLevel/`** folder.
 - **Example Queries:**
 - **Example Queries:**
     - "Show me the hierarchy of `SampleScene.unity`." -> Read `Assets/Scenes/SampleScene.json` and format the nested structure.
     - "Show me the hierarchy of `SampleScene.unity`." -> Read `Assets/Scenes/SampleScene.json` and format the nested structure.
     - "What components are on the 'Player' GameObject?" -> Find the 'Player' object in the relevant scene/prefab JSON and list its `components`.
     - "What components are on the 'Player' GameObject?" -> Find the 'Player' object in the relevant scene/prefab JSON and list its `components`.
@@ -38,7 +38,7 @@ If the question is about the arrangement of objects within a scene/prefab, what
 Only proceed to this step if the user asks for a specific detail that is not present in the Mid-level hierarchy view. You should ideally already have the `fileID` of the target object from a previous Mid-level query.
 Only proceed to this step if the user asks for a specific detail that is not present in the Mid-level hierarchy view. You should ideally already have the `fileID` of the target object from a previous Mid-level query.
 
 
 - **Keywords:** "exact position", "specific value", "rotation", "scale", "property of", "what is the speed".
 - **Keywords:** "exact position", "specific value", "rotation", "scale", "property of", "what is the speed".
-- **Action:** Use **`LowLevel.zip`**.
+- **Action:** Use the **`LowLevel/`** folder.
 - **Example Queries:**
 - **Example Queries:**
     - "What are the exact local position coordinates of the 'Gun' object in the `Player` prefab?"
     - "What are the exact local position coordinates of the 'Gun' object in the `Player` prefab?"
         1.  **Mid-Level:** First, open `Assets/CustomGame/Prefabs/Player.json`. Find the "Gun" object in the hierarchy to get its `fileID` (e.g., `101112`).
         1.  **Mid-Level:** First, open `Assets/CustomGame/Prefabs/Player.json`. Find the "Gun" object in the hierarchy to get its `fileID` (e.g., `101112`).
@@ -49,4 +49,4 @@ Only proceed to this step if the user asks for a specific detail that is not pre
 
 
 ## 4. De-Tokenization Reminder
 ## 4. De-Tokenization Reminder
 
 
-For all JSON files, **always check for tokenization first**. If the file contains a `key_mapper`, you must de-tokenize the `data` payload before attempting to process it according to the schemas and strategies outlined above.
+For all JSON files, **always check for tokenization first**. If the file contains a `key_mapper`, you must de-tokenize the `data` payload before attempting to process it according to the schemas and strategies outlined above.

+ 41 - 13
Assets/LLM/source/orchestrators/extract_high_level.py

@@ -19,6 +19,7 @@ def parse_physics_settings(input_dir, project_mode):
     """
     """
     Parses the appropriate physics settings file based on the project mode.
     Parses the appropriate physics settings file based on the project mode.
     """
     """
+    print(f"    -> Analyzing {project_mode} physics settings...")
     physics_data = {}
     physics_data = {}
     if project_mode == "3D":
     if project_mode == "3D":
         asset_path = input_dir / "ProjectSettings" / "DynamicsManager.asset"
         asset_path = input_dir / "ProjectSettings" / "DynamicsManager.asset"
@@ -32,6 +33,8 @@ def parse_physics_settings(input_dir, project_mode):
                 physics_data['layerCollisionMatrix'] = settings.get('m_LayerCollisionMatrix')
                 physics_data['layerCollisionMatrix'] = settings.get('m_LayerCollisionMatrix')
                 physics_data['autoSimulation'] = settings.get('m_AutoSimulation')
                 physics_data['autoSimulation'] = settings.get('m_AutoSimulation')
                 physics_data['autoSyncTransforms'] = settings.get('m_AutoSyncTransforms')
                 physics_data['autoSyncTransforms'] = settings.get('m_AutoSyncTransforms')
+        else:
+            print("       ...DynamicsManager.asset not found.")
     else: # 2D
     else: # 2D
         asset_path = input_dir / "ProjectSettings" / "Physics2DSettings.asset"
         asset_path = input_dir / "ProjectSettings" / "Physics2DSettings.asset"
         if asset_path.is_file():
         if asset_path.is_file():
@@ -44,6 +47,8 @@ def parse_physics_settings(input_dir, project_mode):
                 physics_data['layerCollisionMatrix'] = settings.get('m_LayerCollisionMatrix')
                 physics_data['layerCollisionMatrix'] = settings.get('m_LayerCollisionMatrix')
                 physics_data['autoSimulation'] = settings.get('m_AutoSimulation')
                 physics_data['autoSimulation'] = settings.get('m_AutoSimulation')
                 physics_data['autoSyncTransforms'] = settings.get('m_AutoSyncTransforms')
                 physics_data['autoSyncTransforms'] = settings.get('m_AutoSyncTransforms')
+        else:
+            print("       ...Physics2DSettings.asset not found.")
     
     
     return physics_data
     return physics_data
 
 
@@ -51,12 +56,13 @@ def parse_project_settings(input_dir, output_dir, indent=None, shrink=False, ign
     """
     """
     Parses various project settings files to create a comprehensive manifest.
     Parses various project settings files to create a comprehensive manifest.
     """
     """
-    print("\n--- Starting Task 2: Comprehensive Project Settings Parser ---")
-    
     manifest_data = {}
     manifest_data = {}
+    print("--> Generating GUID to Path map...")
     guid_map = create_guid_to_path_map(str(input_dir), ignored_folders=ignored_folders)
     guid_map = create_guid_to_path_map(str(input_dir), ignored_folders=ignored_folders)
+    print("    ...GUID map generated.")
 
 
     # --- ProjectSettings.asset ---
     # --- ProjectSettings.asset ---
+    print("--> Parsing ProjectSettings.asset...")
     project_settings_path = input_dir / "ProjectSettings" / "ProjectSettings.asset"
     project_settings_path = input_dir / "ProjectSettings" / "ProjectSettings.asset"
     if project_settings_path.is_file():
     if project_settings_path.is_file():
         docs = load_unity_yaml(str(project_settings_path))
         docs = load_unity_yaml(str(project_settings_path))
@@ -127,16 +133,22 @@ def parse_project_settings(input_dir, output_dir, indent=None, shrink=False, ign
             
             
             manifest_data['scriptingBackend'] = final_scripting_backends
             manifest_data['scriptingBackend'] = final_scripting_backends
             manifest_data['apiCompatibilityLevel'] = final_api_levels
             manifest_data['apiCompatibilityLevel'] = final_api_levels
+    else:
+        print("    ...ProjectSettings.asset not found.")
 
 
     # --- EditorSettings.asset for 2D/3D Mode ---
     # --- EditorSettings.asset for 2D/3D Mode ---
+    print("--> Parsing EditorSettings.asset...")
     editor_settings_path = input_dir / "ProjectSettings" / "EditorSettings.asset"
     editor_settings_path = input_dir / "ProjectSettings" / "EditorSettings.asset"
     if editor_settings_path.is_file():
     if editor_settings_path.is_file():
         docs = load_unity_yaml(str(editor_settings_path))
         docs = load_unity_yaml(str(editor_settings_path))
         if docs:
         if docs:
             editor_settings = convert_to_plain_python_types(docs[0]).get('EditorSettings', {})
             editor_settings = convert_to_plain_python_types(docs[0]).get('EditorSettings', {})
             manifest_data['projectMode'] = "2D" if editor_settings.get('m_DefaultBehaviorMode') == 1 else "3D"
             manifest_data['projectMode'] = "2D" if editor_settings.get('m_DefaultBehaviorMode') == 1 else "3D"
+    else:
+        print("    ...EditorSettings.asset not found.")
 
 
     # --- GraphicsSettings.asset for Render Pipeline ---
     # --- GraphicsSettings.asset for Render Pipeline ---
+    print("--> Parsing GraphicsSettings.asset...")
     graphics_settings_path = input_dir / "ProjectSettings" / "GraphicsSettings.asset"
     graphics_settings_path = input_dir / "ProjectSettings" / "GraphicsSettings.asset"
     manifest_data['renderPipeline'] = 'Built-in'
     manifest_data['renderPipeline'] = 'Built-in'
     if graphics_settings_path.is_file():
     if graphics_settings_path.is_file():
@@ -153,9 +165,12 @@ def parse_project_settings(input_dir, output_dir, indent=None, shrink=False, ign
                     if "URP" in asset_path: manifest_data['renderPipeline'] = 'URP'
                     if "URP" in asset_path: manifest_data['renderPipeline'] = 'URP'
                     elif "HDRP" in asset_path: manifest_data['renderPipeline'] = 'HDRP'
                     elif "HDRP" in asset_path: manifest_data['renderPipeline'] = 'HDRP'
                     else: manifest_data['renderPipeline'] = 'Scriptable'
                     else: manifest_data['renderPipeline'] = 'Scriptable'
+    else:
+        print("    ...GraphicsSettings.asset not found.")
 
 
 
 
     # --- TagManager.asset ---
     # --- TagManager.asset ---
+    print("--> Parsing TagManager.asset...")
     tag_manager_path = input_dir / "ProjectSettings" / "TagManager.asset"
     tag_manager_path = input_dir / "ProjectSettings" / "TagManager.asset"
     if tag_manager_path.is_file():
     if tag_manager_path.is_file():
         docs = load_unity_yaml(str(tag_manager_path))
         docs = load_unity_yaml(str(tag_manager_path))
@@ -165,8 +180,11 @@ def parse_project_settings(input_dir, output_dir, indent=None, shrink=False, ign
             layers_list = tag_manager.get('layers', [])
             layers_list = tag_manager.get('layers', [])
             # Only include layers that have a name, preserving their index
             # Only include layers that have a name, preserving their index
             manifest_data['layers'] = {i: name for i, name in enumerate(layers_list) if name}
             manifest_data['layers'] = {i: name for i, name in enumerate(layers_list) if name}
+    else:
+        print("    ...TagManager.asset not found.")
 
 
     # --- EditorBuildSettings.asset ---
     # --- EditorBuildSettings.asset ---
+    print("--> Parsing EditorBuildSettings.asset...")
     build_settings_path = input_dir / "ProjectSettings" / "EditorBuildSettings.asset"
     build_settings_path = input_dir / "ProjectSettings" / "EditorBuildSettings.asset"
     if build_settings_path.is_file():
     if build_settings_path.is_file():
         docs = load_unity_yaml(str(build_settings_path))
         docs = load_unity_yaml(str(build_settings_path))
@@ -176,8 +194,11 @@ def parse_project_settings(input_dir, output_dir, indent=None, shrink=False, ign
                 {'path': scene.get('path'), 'enabled': scene.get('enabled') == 1}
                 {'path': scene.get('path'), 'enabled': scene.get('enabled') == 1}
                 for scene in build_settings.get('m_Scenes', [])
                 for scene in build_settings.get('m_Scenes', [])
             ]
             ]
+    else:
+        print("    ...EditorBuildSettings.asset not found.")
 
 
     # --- TimeManager.asset ---
     # --- TimeManager.asset ---
+    print("--> Parsing TimeManager.asset...")
     time_manager_path = input_dir / "ProjectSettings" / "TimeManager.asset"
     time_manager_path = input_dir / "ProjectSettings" / "TimeManager.asset"
     if time_manager_path.is_file():
     if time_manager_path.is_file():
         docs = load_unity_yaml(str(time_manager_path))
         docs = load_unity_yaml(str(time_manager_path))
@@ -190,15 +211,18 @@ def parse_project_settings(input_dir, output_dir, indent=None, shrink=False, ign
                 'm_TimeScale': time_manager.get('m_TimeScale'),
                 'm_TimeScale': time_manager.get('m_TimeScale'),
                 'Maximum Particle Timestep': time_manager.get('Maximum Particle Timestep')
                 'Maximum Particle Timestep': time_manager.get('Maximum Particle Timestep')
             }
             }
+    else:
+        print("    ...TimeManager.asset not found.")
 
 
     # --- Physics Settings ---
     # --- Physics Settings ---
+    print("--> Parsing physics settings...")
     manifest_data['physicsSettings'] = parse_physics_settings(input_dir, manifest_data.get('projectMode', '3D'))
     manifest_data['physicsSettings'] = parse_physics_settings(input_dir, manifest_data.get('projectMode', '3D'))
 
 
     # --- Write manifest.json ---
     # --- Write manifest.json ---
     manifest_output_path = output_dir / "manifest.json"
     manifest_output_path = output_dir / "manifest.json"
     try:
     try:
         write_json(manifest_data, manifest_output_path, indent=indent, shrink=shrink)
         write_json(manifest_data, manifest_output_path, indent=indent, shrink=shrink)
-        print(f"Successfully created manifest.json at {manifest_output_path}")
+        print(f"--> Successfully created manifest.json at {manifest_output_path}")
     except Exception as e:
     except Exception as e:
         print(f"Error writing to {manifest_output_path}. {e}", file=sys.stderr)
         print(f"Error writing to {manifest_output_path}. {e}", file=sys.stderr)
 
 
@@ -207,18 +231,17 @@ def parse_package_manifests(input_dir, output_dir, indent=None, shrink=False):
     """
     """
     Parses the primary package manifest and creates a clean packages.json file.
     Parses the primary package manifest and creates a clean packages.json file.
     """
     """
-    print("\n--- Starting Task 3: Package Manifest Extractor ---")
-    
     manifest_path = input_dir / "Packages" / "manifest.json"
     manifest_path = input_dir / "Packages" / "manifest.json"
     
     
     if manifest_path.is_file():
     if manifest_path.is_file():
         try:
         try:
+            print(f"--> Found package manifest at {manifest_path}")
             with open(manifest_path, 'r', encoding='utf-8') as f:
             with open(manifest_path, 'r', encoding='utf-8') as f:
                 packages_data = json.load(f)
                 packages_data = json.load(f)
             
             
             packages_output_path = output_dir / "packages.json"
             packages_output_path = output_dir / "packages.json"
             write_json(packages_data, packages_output_path, indent=indent, shrink=shrink)
             write_json(packages_data, packages_output_path, indent=indent, shrink=shrink)
-            print(f"Successfully created packages.json at {packages_output_path}")
+            print(f"--> Successfully created packages.json at {packages_output_path}")
 
 
         except (IOError, json.JSONDecodeError) as e:
         except (IOError, json.JSONDecodeError) as e:
             print(f"Error processing {manifest_path}: {e}", file=sys.stderr)
             print(f"Error processing {manifest_path}: {e}", file=sys.stderr)
@@ -257,17 +280,22 @@ def main():
         print(f"Error: Could not create output directory '{high_level_output_dir}'. {e}", file=sys.stderr)
         print(f"Error: Could not create output directory '{high_level_output_dir}'. {e}", file=sys.stderr)
         sys.exit(1)
         sys.exit(1)
 
 
+    print("\n--- Running High-Level Extraction ---")
+
+    print("\n[1/2] Parsing project settings...")
     parse_project_settings(
     parse_project_settings(
-        input_dir, 
-        high_level_output_dir, 
-        indent=indent_level, 
-        shrink=shrink_json, 
+        input_dir,
+        high_level_output_dir,
+        indent=indent_level,
+        shrink=shrink_json,
         ignored_folders=ignored_folders
         ignored_folders=ignored_folders
     )
     )
+
+    print("\n[2/2] Parsing package manifests...")
     parse_package_manifests(
     parse_package_manifests(
-        input_dir, 
-        high_level_output_dir, 
-        indent=indent_level, 
+        input_dir,
+        high_level_output_dir,
+        indent=indent_level,
         shrink=shrink_json
         shrink=shrink_json
     )
     )
 
 

+ 49 - 21
Assets/LLM/source/orchestrators/extract_low_level.py

@@ -11,41 +11,54 @@ sys.path.append(str(utils_path))
 from config_utils import load_config
 from config_utils import load_config
 
 
 def run_subprocess(script_name, input_dir, output_dir, indent=None, shrink=False, ignored_folders=None):
 def run_subprocess(script_name, input_dir, output_dir, indent=None, shrink=False, ignored_folders=None):
-    """Helper function to run a parser subprocess."""
+    """Helper function to run a parser subprocess and stream its output."""
     script_path = Path(__file__).parent.parent / "parsers" / script_name
     script_path = Path(__file__).parent.parent / "parsers" / script_name
     command = [
     command = [
         sys.executable,
         sys.executable,
         str(script_path),
         str(script_path),
-        "--input",
-        str(input_dir),
-        "--output",
-        str(output_dir)
+        "--input", str(input_dir),
+        "--output", str(output_dir)
     ]
     ]
-    
-    # Pass indent and shrink arguments to subparsers.
-    # This assumes the subparsers have been updated to accept them.
+
     if indent is not None:
     if indent is not None:
         command.extend(["--indent", str(indent)])
         command.extend(["--indent", str(indent)])
     if shrink:
     if shrink:
         command.append("--shrink-json")
         command.append("--shrink-json")
-    
     if ignored_folders:
     if ignored_folders:
         command.extend(["--ignored-folders", json.dumps(ignored_folders)])
         command.extend(["--ignored-folders", json.dumps(ignored_folders)])
 
 
+    # For logging, create a display-friendly version of the command
+    display_command = ' '.join(f'"{c}"' if ' ' in c else c for c in command)
+    print(f"--> Executing: {display_command}")
+
     try:
     try:
-        result = subprocess.run(command, check=True, text=True, capture_output=True, encoding='utf-8')
-        if result.stdout:
-            print(result.stdout)
-        if result.stderr:
-            print(result.stderr, file=sys.stderr)
-
-    except subprocess.CalledProcessError as e:
-        print(f"--- ERROR in {script_name} ---")
-        print(e.stdout)
-        print(e.stderr, file=sys.stderr)
-        print(f"--- End of Error ---")
+        # Use Popen to stream output in real-time
+        process = subprocess.Popen(
+            command,
+            stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE,
+            text=True,
+            encoding='utf-8',
+            bufsize=1  # Line-buffered
+        )
+
+        # Stream stdout
+        for line in process.stdout:
+            print(line, end='')
+
+        # Stream stderr
+        for line in process.stderr:
+            print(line, end='', file=sys.stderr)
+
+        process.wait()  # Wait for the subprocess to finish
+
+        if process.returncode != 0:
+            print(f"--- ERROR in {script_name} completed with exit code {process.returncode} ---", file=sys.stderr)
+
     except FileNotFoundError:
     except FileNotFoundError:
         print(f"--- ERROR: Script not found at {script_path} ---", file=sys.stderr)
         print(f"--- ERROR: Script not found at {script_path} ---", file=sys.stderr)
+    except Exception as e:
+        print(f"--- An unexpected error occurred while running {script_name}: {e} ---", file=sys.stderr)
 
 
 
 
 def main():
 def main():
@@ -77,12 +90,27 @@ def main():
     print(f"Output will be saved to: {low_level_output_dir}")
     print(f"Output will be saved to: {low_level_output_dir}")
 
 
     # --- Run Extraction Pipeline ---
     # --- Run Extraction Pipeline ---
-    # Pass all relevant config options to the subparsers.
+    print("\n--- Running Low-Level Extraction Pipeline ---")
+    
+    print("\n[1/5] Starting: Copy scripts...")
     run_subprocess("copy_scripts.py", input_dir, low_level_output_dir, indent_level, shrink_json, ignored_folders)
     run_subprocess("copy_scripts.py", input_dir, low_level_output_dir, indent_level, shrink_json, ignored_folders)
+    print("--> Finished: Copy scripts.")
+    
+    print("\n[2/5] Starting: Copy shaders...")
     run_subprocess("copy_shaders.py", input_dir, low_level_output_dir, indent_level, shrink_json, ignored_folders)
     run_subprocess("copy_shaders.py", input_dir, low_level_output_dir, indent_level, shrink_json, ignored_folders)
+    print("--> Finished: Copy shaders.")
+    
+    print("\n[3/5] Starting: Parse project settings...")
     run_subprocess("parse_project_settings.py", input_dir, low_level_output_dir, indent_level, shrink_json, ignored_folders)
     run_subprocess("parse_project_settings.py", input_dir, low_level_output_dir, indent_level, shrink_json, ignored_folders)
+    print("--> Finished: Parse project settings.")
+    
+    print("\n[4/5] Starting: Parse generic assets...")
     run_subprocess("parse_generic_assets.py", input_dir, low_level_output_dir, indent_level, shrink_json, ignored_folders)
     run_subprocess("parse_generic_assets.py", input_dir, low_level_output_dir, indent_level, shrink_json, ignored_folders)
+    print("--> Finished: Parse generic assets.")
+    
+    print("\n[5/5] Starting: Parse scenes and prefabs...")
     run_subprocess("parse_scenes_and_prefabs.py", input_dir, low_level_output_dir, indent_level, shrink_json, ignored_folders)
     run_subprocess("parse_scenes_and_prefabs.py", input_dir, low_level_output_dir, indent_level, shrink_json, ignored_folders)
+    print("--> Finished: Parse scenes and prefabs.")
 
 
     print("\nLow-level extraction pipeline complete.")
     print("\nLow-level extraction pipeline complete.")
 
 

+ 48 - 12
Assets/LLM/source/orchestrators/extract_mid_level.py

@@ -18,14 +18,14 @@ def generate_guid_mappers(input_dir, output_dir, indent=None, shrink=False, igno
     """
     """
     Finds all .meta files and generates JSON files mapping GUIDs to asset paths.
     Finds all .meta files and generates JSON files mapping GUIDs to asset paths.
     """
     """
-    print("\n--- Starting GUID Mapper Generation ---")
     assets_dir = input_dir / "Assets"
     assets_dir = input_dir / "Assets"
     if not assets_dir.is_dir():
     if not assets_dir.is_dir():
         print(f"Error: 'Assets' directory not found in '{input_dir}'", file=sys.stderr)
         print(f"Error: 'Assets' directory not found in '{input_dir}'", file=sys.stderr)
         return
         return
 
 
+    print("--> Finding all .meta files...")
     meta_files = find_files_by_extension(str(assets_dir), '.meta', ignored_folders=ignored_folders, project_root=input_dir)
     meta_files = find_files_by_extension(str(assets_dir), '.meta', ignored_folders=ignored_folders, project_root=input_dir)
-    print(f"Found {len(meta_files)} .meta files to process.")
+    print(f"--> Found {len(meta_files)} .meta files to process.")
 
 
     asset_type_map = {
     asset_type_map = {
         '.prefab': 'prefabs', '.unity': 'scenes', '.mat': 'materials',
         '.prefab': 'prefabs', '.unity': 'scenes', '.mat': 'materials',
@@ -36,7 +36,12 @@ def generate_guid_mappers(input_dir, output_dir, indent=None, shrink=False, igno
     guid_maps = {value: {} for value in asset_type_map.values()}
     guid_maps = {value: {} for value in asset_type_map.values()}
     guid_maps['others'] = {}
     guid_maps['others'] = {}
 
 
-    for meta_file_path_str in meta_files:
+    print("--> Parsing .meta files and mapping GUIDs to asset paths...")
+    total_files = len(meta_files)
+    for i, meta_file_path_str in enumerate(meta_files):
+        if (i + 1) % 250 == 0:
+            print(f"    ...processed {i+1}/{total_files} .meta files...")
+
         meta_file_path = Path(meta_file_path_str)
         meta_file_path = Path(meta_file_path_str)
         asset_file_path = Path(meta_file_path_str.rsplit('.meta', 1)[0])
         asset_file_path = Path(meta_file_path_str.rsplit('.meta', 1)[0])
 
 
@@ -60,17 +65,20 @@ def generate_guid_mappers(input_dir, output_dir, indent=None, shrink=False, igno
             # Use the full path from the project root for the guid_map value
             # Use the full path from the project root for the guid_map value
             full_asset_path = input_dir / asset_file_path
             full_asset_path = input_dir / asset_file_path
             guid_maps[asset_type][guid] = full_asset_path.as_posix()
             guid_maps[asset_type][guid] = full_asset_path.as_posix()
+    print(f"    ...finished processing all {total_files} .meta files.")
 
 
     mappers_dir = output_dir / "GuidMappers"
     mappers_dir = output_dir / "GuidMappers"
     try:
     try:
         mappers_dir.mkdir(parents=True, exist_ok=True)
         mappers_dir.mkdir(parents=True, exist_ok=True)
+        print(f"--> Writing GUID mapper files to {mappers_dir}...")
         for asset_type, guid_map in guid_maps.items():
         for asset_type, guid_map in guid_maps.items():
             if guid_map:
             if guid_map:
                 output_path = mappers_dir / f"{asset_type}.json"
                 output_path = mappers_dir / f"{asset_type}.json"
                 # For the output JSON, we still want the project-relative path
                 # For the output JSON, we still want the project-relative path
                 relative_guid_map = {g: Path(p).relative_to(input_dir).as_posix() for g, p in guid_map.items()}
                 relative_guid_map = {g: Path(p).relative_to(input_dir).as_posix() for g, p in guid_map.items()}
                 write_json(relative_guid_map, output_path, indent=indent, shrink=shrink)
                 write_json(relative_guid_map, output_path, indent=indent, shrink=shrink)
-        print(f"Successfully created GUID mappers in {mappers_dir}")
+                print(f"    -> Created {asset_type}.json with {len(guid_map)} entries.")
+        print(f"--> Successfully created all GUID mappers.")
     except OSError as e:
     except OSError as e:
         print(f"Error: Could not create GUID mapper directory or files. {e}", file=sys.stderr)
         print(f"Error: Could not create GUID mapper directory or files. {e}", file=sys.stderr)
     
     
@@ -131,12 +139,15 @@ def main():
         print(f"Warning: 'Assets' directory not found in '{input_dir}'. Skipping all processing.", file=sys.stderr)
         print(f"Warning: 'Assets' directory not found in '{input_dir}'. Skipping all processing.", file=sys.stderr)
         return
         return
 
 
+    print("\n--- Running Mid-Level Extraction ---")
+
     # --- Task 1: Replicate 'Assets' directory structure ---
     # --- Task 1: Replicate 'Assets' directory structure ---
-    print(f"\n--- Replicating 'Assets' directory structure ---")
+    print("\n[1/3] Replicating 'Assets' directory structure...")
     replicate_directory_structure(str(assets_dir), str(output_assets_dir), ignored_folders=ignored_folders, project_root=input_dir)
     replicate_directory_structure(str(assets_dir), str(output_assets_dir), ignored_folders=ignored_folders, project_root=input_dir)
-    print("Directory structure replication complete.")
+    print("--> Directory structure replication complete.")
 
 
     # --- Task 2: Generate GUID Map and Mappers ---
     # --- Task 2: Generate GUID Map and Mappers ---
+    print("\n[2/3] Generating GUID Mappers...")
     guid_map = generate_guid_mappers(
     guid_map = generate_guid_mappers(
         input_dir, 
         input_dir, 
         mid_level_output_dir, 
         mid_level_output_dir, 
@@ -144,32 +155,57 @@ def main():
         shrink=shrink_json, 
         shrink=shrink_json, 
         ignored_folders=ignored_folders
         ignored_folders=ignored_folders
     )
     )
+    print("--> GUID Mapper generation complete.")
 
 
     # --- Task 3: Orchestrate Scene and Prefab Parsing for Hierarchy ---
     # --- Task 3: Orchestrate Scene and Prefab Parsing for Hierarchy ---
-    print("\n--- Starting Scene/Prefab Hierarchy Parsing ---")
+    print("\n[3/3] Parsing Scene and Prefab Hierarchies...")
     
     
+    print("--> Finding scene and prefab files...")
     scene_files = find_files_by_extension(str(assets_dir), '.unity', ignored_folders=ignored_folders, project_root=input_dir)
     scene_files = find_files_by_extension(str(assets_dir), '.unity', ignored_folders=ignored_folders, project_root=input_dir)
     prefab_files = find_files_by_extension(str(assets_dir), '.prefab', ignored_folders=ignored_folders, project_root=input_dir)
     prefab_files = find_files_by_extension(str(assets_dir), '.prefab', ignored_folders=ignored_folders, project_root=input_dir)
     files_to_process = scene_files + prefab_files
     files_to_process = scene_files + prefab_files
     
     
-    print(f"Found {len(files_to_process)} scene/prefab files to process for hierarchy.")
+    print(f"--> Found {len(files_to_process)} total scene/prefab files to process.")
 
 
-    for file_path_str in files_to_process:
+    total_files = len(files_to_process)
+    for i, file_path_str in enumerate(files_to_process):
         file_path = Path(file_path_str)
         file_path = Path(file_path_str)
         
         
         relative_path = file_path.relative_to(assets_dir)
         relative_path = file_path.relative_to(assets_dir)
         output_json_path = (output_assets_dir / relative_path).with_suffix('.json')
         output_json_path = (output_assets_dir / relative_path).with_suffix('.json')
         
         
         try:
         try:
-            print(f"\n--- Processing Hierarchy for: {file_path.name} ---")
+            print(f"\n--- Processing {file_path.name} ({i+1}/{total_files}) ---")
             
             
             # Use the sophisticated processor for building the visual tree
             # Use the sophisticated processor for building the visual tree
             processor = UnitySceneProcessor(guid_map)
             processor = UnitySceneProcessor(guid_map)
-            hierarchy = processor.process_file(file_path)
+            
+            print(f"    -> Loading and parsing file...")
+            if not processor.load_documents(file_path):
+                print(f"Warning: Could not load or parse {file_path.name}. Skipping.", file=sys.stderr)
+                continue
+
+            print(f"    -> Pass 1/6: Building relationship maps and creating basic nodes...")
+            processor.process_first_pass()
+
+            print(f"    -> Pass 2/6: Building hierarchy relationships...")
+            processor.process_second_pass()
+
+            print(f"    -> Pass 3/6: Verifying and fixing parent-child relationships...")
+            processor.verification_pass()
+
+            print(f"    -> Pass 4/6: Extracting components...")
+            processor.process_third_pass()
+
+            print(f"    -> Pass 5/6: Merging prefab data...")
+            processor.merge_prefab_data_pass()
+            
+            print(f"    -> Pass 6/6: Assembling final hierarchy...")
+            hierarchy = processor.get_hierarchy()
             
             
             output_json_path.parent.mkdir(parents=True, exist_ok=True)
             output_json_path.parent.mkdir(parents=True, exist_ok=True)
             write_json(hierarchy, output_json_path, indent=indent_level, shrink=shrink_json)
             write_json(hierarchy, output_json_path, indent=indent_level, shrink=shrink_json)
-            print(f"Successfully processed hierarchy for {file_path.name} -> {output_json_path}")
+            print(f"--> Successfully processed hierarchy for {file_path.name} -> {output_json_path}")
 
 
         except Exception as e:
         except Exception as e:
             print(f"Error processing hierarchy for {file_path.name}: {e}", file=sys.stderr)
             print(f"Error processing hierarchy for {file_path.name}: {e}", file=sys.stderr)

BIN
Assets/LLM/source/parsers/__pycache__/scene_processor.cpython-313.pyc


+ 19 - 7
Assets/LLM/source/parsers/parse_scenes_and_prefabs.py

@@ -46,10 +46,13 @@ def main():
     print(f"\n--- Starting Scene/Prefab Parsing ---")
     print(f"\n--- Starting Scene/Prefab Parsing ---")
     print(f"Found {len(files_to_process)} files to process.")
     print(f"Found {len(files_to_process)} files to process.")
 
 
-    for file_path_str in files_to_process:
+    total_files = len(files_to_process)
+    for i, file_path_str in enumerate(files_to_process):
         file_path = Path(file_path_str)
         file_path = Path(file_path_str)
-        print(f"\nProcessing: {file_path.name}")
+        print(f"\n--- Processing file {i+1}/{total_files}: {file_path.relative_to(input_dir)} ---")
 
 
+        # --- Step 1: Deep Parsing ---
+        print("  [1/2] Starting deep parse to extract all GameObjects...")
         gameobject_list = parse_scene_or_prefab(str(file_path))
         gameobject_list = parse_scene_or_prefab(str(file_path))
 
 
         relative_path = file_path.relative_to(input_dir)
         relative_path = file_path.relative_to(input_dir)
@@ -57,23 +60,30 @@ def main():
         asset_output_dir.mkdir(parents=True, exist_ok=True)
         asset_output_dir.mkdir(parents=True, exist_ok=True)
 
 
         if gameobject_list:
         if gameobject_list:
-            print(f"Saving {len(gameobject_list)} GameObjects to {asset_output_dir}")
+            print(f"    -> Deep parse complete. Found {len(gameobject_list)} GameObjects.")
+            print(f"    -> Saving individual GameObject files to {asset_output_dir}...")
             for go_data in gameobject_list:
             for go_data in gameobject_list:
                 file_id = go_data.get('fileID')
                 file_id = go_data.get('fileID')
                 if file_id:
                 if file_id:
                     output_json_path = asset_output_dir / f"{file_id}.json"
                     output_json_path = asset_output_dir / f"{file_id}.json"
                     write_json(go_data, output_json_path, indent=args.indent, shrink=args.shrink_json)
                     write_json(go_data, output_json_path, indent=args.indent, shrink=args.shrink_json)
+            print(f"    -> Finished saving GameObject files.")
         else:
         else:
-            print(f"Skipped deep parsing for {file_path.name}.")
+            print(f"    -> Skipped deep parsing for {file_path.name} (no objects found or file was empty).")
 
 
+        # --- Step 2: Hierarchy Parsing ---
+        print("  [2/2] Starting hierarchy parse to identify root objects...")
         try:
         try:
             documents = load_unity_yaml(file_path)
             documents = load_unity_yaml(file_path)
             if not documents:
             if not documents:
+                print("    -> Skipping hierarchy parse (file is empty or could not be read).")
                 continue
                 continue
 
 
+            print("    -> Converting YAML for hierarchy analysis...")
             raw_object_map = {int(doc.anchor.value): doc for doc in documents if hasattr(doc, 'anchor') and doc.anchor is not None}
             raw_object_map = {int(doc.anchor.value): doc for doc in documents if hasattr(doc, 'anchor') and doc.anchor is not None}
             object_map = {file_id: convert_to_plain_python_types(obj) for file_id, obj in raw_object_map.items()}
             object_map = {file_id: convert_to_plain_python_types(obj) for file_id, obj in raw_object_map.items()}
 
 
+            print("    -> Analyzing transform hierarchy to find roots...")
             parser = HierarchyParser(object_map)
             parser = HierarchyParser(object_map)
             root_object_ids = parser.get_root_object_ids()
             root_object_ids = parser.get_root_object_ids()
             
             
@@ -82,12 +92,14 @@ def main():
             if root_ids_list:
             if root_ids_list:
                 roots_output_path = asset_output_dir / "root_objects.json"
                 roots_output_path = asset_output_dir / "root_objects.json"
                 write_json(root_ids_list, roots_output_path, indent=args.indent, shrink=args.shrink_json)
                 write_json(root_ids_list, roots_output_path, indent=args.indent, shrink=args.shrink_json)
-                print(f"Successfully saved root object list to {roots_output_path}")
+                print(f"    -> Successfully saved {len(root_ids_list)} root object IDs to {roots_output_path}")
+            else:
+                print("    -> No root objects found.")
 
 
         except Exception as e:
         except Exception as e:
-            print(f"Error during hierarchy parsing for {file_path.name}: {e}", file=sys.stderr)
+            print(f"    -> Error during hierarchy parsing for {file_path.name}: {e}", file=sys.stderr)
 
 
-    print("Scene and prefab parsing complete.")
+    print("\nScene and prefab parsing complete.")
 
 
 if __name__ == "__main__":
 if __name__ == "__main__":
     main()
     main()

+ 22 - 13
Assets/LLM/source/parsers/scene_processor.py

@@ -431,24 +431,19 @@ class UnitySceneProcessor:
             if 'children' in node and node['children']:
             if 'children' in node and node['children']:
                 self.cleanup_pass(node['children'])
                 self.cleanup_pass(node['children'])
 
 
-    def process_file(self, file_path):
-        """Main processing method"""
-        # Load and parse the file
+    def load_documents(self, file_path):
+        """Loads and parses the yaml documents from a scene/prefab file."""
         documents = load_unity_yaml(file_path)
         documents = load_unity_yaml(file_path)
         if not documents:
         if not documents:
-            return []
-
-        # Build object map
+            self.object_map = {}
+            return False
+        
         raw_object_map = {int(doc.anchor.value): doc for doc in documents if hasattr(doc, 'anchor') and doc.anchor is not None}
         raw_object_map = {int(doc.anchor.value): doc for doc in documents if hasattr(doc, 'anchor') and doc.anchor is not None}
         self.object_map = {file_id: convert_to_plain_python_types(obj) for file_id, obj in raw_object_map.items()}
         self.object_map = {file_id: convert_to_plain_python_types(obj) for file_id, obj in raw_object_map.items()}
+        return True
 
 
-        # Process in passes
-        self.process_first_pass()
-        self.process_second_pass()
-        self.verification_pass()
-        self.process_third_pass()
-        self.merge_prefab_data_pass()
-
+    def get_hierarchy(self):
+        """Builds and returns the final, cleaned-up hierarchy from the processed data."""
         # Use the centralized parser to get the final, sorted root objects
         # Use the centralized parser to get the final, sorted root objects
         parser = HierarchyParser(self.object_map)
         parser = HierarchyParser(self.object_map)
         root_object_ids = parser.get_root_object_ids()
         root_object_ids = parser.get_root_object_ids()
@@ -465,6 +460,20 @@ class UnitySceneProcessor:
 
 
         return root_nodes
         return root_nodes
 
 
+    def process_file(self, file_path):
+        """Main processing method"""
+        if not self.load_documents(file_path):
+            return []
+
+        # Process in passes
+        self.process_first_pass()
+        self.process_second_pass()
+        self.verification_pass()
+        self.process_third_pass()
+        self.merge_prefab_data_pass()
+
+        return self.get_hierarchy()
+
 
 
 if __name__ == "__main__":
 if __name__ == "__main__":
     # This script is intended to be used as a module.
     # This script is intended to be used as a module.

BIN
Assets/LLM/source/utils/__pycache__/deep_parser.cpython-313.pyc


+ 10 - 0
Assets/LLM/source/utils/deep_parser.py

@@ -28,14 +28,17 @@ def parse_scene_or_prefab(file_path, guid_map=None, assets_dir=None):
     if not documents:
     if not documents:
         return None
         return None
 
 
+    print("    -> Converting YAML documents to Python objects...")
     raw_object_map = {int(doc.anchor.value): doc for doc in documents if hasattr(doc, 'anchor') and doc.anchor is not None}
     raw_object_map = {int(doc.anchor.value): doc for doc in documents if hasattr(doc, 'anchor') and doc.anchor is not None}
     object_map = {file_id: convert_to_plain_python_types(obj) for file_id, obj in raw_object_map.items()}
     object_map = {file_id: convert_to_plain_python_types(obj) for file_id, obj in raw_object_map.items()}
+    print(f"    -> Found {len(object_map)} objects in the file.")
 
 
     flat_gameobject_list = []
     flat_gameobject_list = []
     transform_to_gameobject = {}
     transform_to_gameobject = {}
     parent_to_children_transforms = {}
     parent_to_children_transforms = {}
 
 
     # First pass: Populate relationship maps
     # First pass: Populate relationship maps
+    print("    -> Pass 1/2: Mapping object relationships...")
     for file_id, obj_data in object_map.items():
     for file_id, obj_data in object_map.items():
         if 'Transform' in obj_data:
         if 'Transform' in obj_data:
             parent_id = obj_data['Transform'].get('m_Father', {}).get('fileID')
             parent_id = obj_data['Transform'].get('m_Father', {}).get('fileID')
@@ -53,8 +56,15 @@ def parse_scene_or_prefab(file_path, guid_map=None, assets_dir=None):
                     break
                     break
 
 
     # Second pass: Process each GameObject
     # Second pass: Process each GameObject
+    print("    -> Pass 2/2: Extracting GameObject data and components...")
+    total_gos = sum(1 for obj in object_map.values() if 'GameObject' in obj)
+    processed_gos = 0
     for file_id, obj_data in object_map.items():
     for file_id, obj_data in object_map.items():
         if 'GameObject' in obj_data:
         if 'GameObject' in obj_data:
+            processed_gos += 1
+            if processed_gos % 100 == 0:
+                print(f"        ...processed {processed_gos}/{total_gos} GameObjects...")
+
             go_info = obj_data['GameObject']
             go_info = obj_data['GameObject']
             
             
             source_obj_info = go_info.get('m_CorrespondingSourceObject')
             source_obj_info = go_info.get('m_CorrespondingSourceObject')