Crawler visualizer

Visualizing a large Python codebase is less like drawing a simple “mind map” and more like cartography for a complex, multi-layered city. A standard mind map has one central idea branching out. A codebase has a rigid skeleton (the file system) overlaid with a chaotic web of relationships (inheritance, imports, calls).

Part 1: The Prescription (The Blueprint)

This prescription defines the visual language and rules for generating the visualization. The goal is to balance the structural hierarchy with the relational complexity.

1. The Visual Metaphor: “Containers and Conduits”

We will not use a free-floating mind map. Instead, we will use a nested box layout (a treemap hybrid) as the substrate, representing the physical file structure on disk. On top of this rigid structure, we overlay colored curved lines representing relationships.

2. The Nodes (Containers)

Nodes represent distinct entities in the code. They should be nestable.

* Level 1: Directories (Folders)

* Visual: Large, light-grey rounded rectangles with thick borders. Label at the top left.

* Function: Grouping related modules.

* Level 2: Files (Modules)

* Visual: Solid white rectangles inside directory boxes. Label at the top.

* Function: The main container for code.

* Level 3: Classes

* Visual: Rectangles with a slight blue tint inside File boxes.

* Function: Structural blueprints.

* Level 4: Functions/Methods

* Visual: Small capsules or ovals inside Files or Classes.

* Function: The executable units.

3. The Edges (The Colored Lines)

These lines must be curved (Bezier curves) to avoid looking like a rigid circuit board and to easily cross over container boundaries. Arrows indicate direction.

* Type A: Inheritance (The “Is-A” Relationship)

* Color: BLUE (Bold/Solid)

* Direction: From Child Class -> Parent Class.

* Meaning: Defines the class hierarchy and polymorphism potential.

* Type B: Static Dependency (The “Needs” Relationship)

* Color: RED (Solid)

* Direction: From Importing File -> Imported File.

* Meaning: File A cannot run without File B existings. (Based on import … statements).

* Type C: Dynamic Reference (The “Uses” Relationship)

* Color: ORANGE (Dashed or thinner)

* Direction: From Function X -> Function Y (call), or Function X -> Class Z (instantiation).

* Meaning: Runtime execution flow.

4. Layout Rules

* Enclosure determines Position: A function node must physically reside inside its parent Class node, which must reside inside its parent File node.

* Minimize Crossings: The layout algorithm should try to arrange files within folders to minimize the distance of red and blue lines.

* Interactivity: Because “lots of files” means a massive image, the final output must be zoomable and pannable (like Google Maps). Clicking a node should highlight only its direct connections and fade out the rest of the graph.

Part 2: Visual Sample

Since I cannot render interactive graphics here, I will provide a stylized diagram. Imagine this rendered on a large canvas where you can zoom in.

Legend:

[ Box ] = Container

( Oval ) = Function/Method

==BLUE==> = Inheritance

==RED==> = Import Dependency

–ORANGE–> = Function Call/Reference

graph TD

subgraph “Folder: /src”

style src fill:#eee,stroke:#333,stroke-width:2px

 

subgraph “File: models.py”

style models.py fill:#fff,stroke:#666

 

subgraph “Class: Animal”

style Animal fill:#e6f0ff,stroke:#00f

(method: speak)

end

 

subgraph “Class: Dog”

style Dog fill:#e6f0ff,stroke:#00f

(method: bark)

end

 

%% BLUE LINE: INHERITANCE

Dog ==BLUE INHERITS FROM==> Animal

end

 

subgraph “File: utils.py”

style utils.py fill:#fff,stroke:#666

(func: logger)

(func: formatter)

%% ORANGE LINE: Internal call

(func: logger) –ORANGE CALLS–> (func: formatter)

end

 

subgraph “File: main.py”

style main.py fill:#fff,stroke:#666

 

%% RED LINES: IMPORTS

main.py ==RED IMPORTS==> models.py

main.py ==RED IMPORTS==> utils.py

 

(func: run_app)

 

%% ORANGE LINES: DYNAMIC USAGE

(func: run_app) –ORANGE INSTANTIATES–> Dog

(func: run_app) –ORANGE CALLS–> (method: bark)

(func: run_app) –ORANGE CALLS–> (func: logger)

end

end

 

Part 3: The Algorithm (How it works)

We need to perform Static Analysis. We will not run the code; we will read it like text, understand its grammar, and extract the relationships.

The tool for this in Python is the built-in ast (Abstract Syntax Tree) module. It turns Python code into a tree structure of objects representing syntax.

Phase 1: The Crawl (Building the Skeleton)

Goal: Create the container nodes (Folders and Files).

* Start at the root project directory.

* Use os.walk() to recursively traverse directories.

* For every directory found, create a “Folder Node”. Record its path.

* For every file ending in .py, create a “File Node”. Record its path and link it as a child of its containing Folder Node.

Phase 2: The Deep Parse (Populating Containers and Finding Blue/Red Lines)

Goal: Look inside every .py file found in Phase 1.

* Read File: Open the .py file and read its content into a string.

* Generate AST: Pass the string to ast.parse(content). This gives us the tree structure of that file.

* Walk the AST: We now traverse this tree looking for specific node types.

* Finding Classes & Methods (Populating Containers):

* Look for ast.ClassDef. Create a “Class Node”.

* Look for ast.FunctionDef (or AsyncFunctionDef). Create a “Function Node”.

* Crucial Step: Maintain context. If you find a FunctionDef inside the body of a ClassDef, mark that function as a method belonging to that class.

* Finding Imports (The RED Lines):

* Look for ast.Import and ast.ImportFrom nodes.

* Extract the module name being imported (e.g., from django.db import models -> imports django.db).

* Resolution Challenge: Try to map that import string back to one of the File Nodes created in Phase 1. If found, draw a RED line from current file to the target file.

* Finding Inheritance (The BLUE Lines):

* Look at ast.ClassDef nodes again.

* Inspect their bases attribute (e.g., class Dog(Animal): -> base is Animal).

* Resolution Challenge: You know the name is “Animal”, but is it defined in this file or imported? You must check the imports found earlier to resolve where “Animal” actually lives. Once resolved to a specific Class Node, draw a BLUE line from Child to Parent.

Phase 3: The Reference Hunt (The ORANGE Lines)

Goal: Find function calls and usages. This is the hardest part of static analysis because Python is dynamic. We will do a “best effort” approximation.

* Walk the AST inside every ast.FunctionDef.

* Look for ast.Call nodes. This is something like my_function().

* Identify the name of the thing being called.

* Attempt Resolution:

* Is it defined elsewhere in this file?

* Is it an imported name?

* If you can identify the target Function or Class Node with reasonable confidence, draw an ORANGE line from the current function to the target.

Phase 4: Visualization Generation

* Take all the Nodes (Folders, Files, Classes, Functions) and their hierarchical relationships.

* Take all the Edges (Blue, Red, Orange).

* Feed this data into a graph visualization library capable of compound nodes and directed edges (e.g., Graphviz with the dot engine, or a JavaScript library like Cytoscape.js or D3.js for interactive web views).

* Apply styling based on the prescription rules.

Part 4: Sample Step-by-Step Execution

Let’s trace a tiny subset of the algorithm on the sample code provided in Part 2.

Input: A folder /src containing main.py and models.py.

models.py content:

class Animal:

pass

class Dog(Animal):

pass

 

main.py content:

from models import Dog

def run():

d = Dog()

 

Algorithm Execution:

* Phase 1 (Crawl):

* Created FolderNode: /src

* Created FileNode: /src/models.py (parent: /src)

* Created FileNode: /src/main.py (parent: /src)

* Phase 2 (Parse models.py):

* AST found ClassDef: Animal. Created ClassNode: Animal inside models.py. No bases.

* AST found ClassDef: Dog. Created ClassNode: Dog inside models.py.

* Found base: Animal. Resolved Animal to the node defined just above.

* Action: Created BLUE Edge from Dog node to Animal node.

* Phase 2 & 3 (Parse main.py):

* AST found ImportFrom: from models import Dog.

* Action: Resolved models to file /src/models.py. Created RED Edge from main.py to models.py. Let the system remember that the name “Dog” refers to the class in models.py.

* AST found FunctionDef: run. Created FunctionNode: run inside main.py.

* Inside run, AST found Call: Dog().

* Action: The system looks up “Dog”. It sees the import resolution from earlier. It knows “Dog” is the ClassNode in models.py. Created ORANGE Edge from function run node to class Dog node.

* Phase 4 (Render):

* Draw box for /src. Inside, draw boxes for models.py and main.py.

* Draw Animal and Dog boxes inside models.py. Draw blue line between them.

*

Draw run oval inside main.py.

* Draw red line between the two file boxes.

* Draw orange line from run oval crossing the file boundaries to the Dog class box.