Git Blame for AI: Why Prompts Belong in Version Control
Image source: M Yusuf (Custom SVG)

When you write a program in English and an AI translates it into Python, which one is the true source code?

In the emerging era of "vibe coding"1, natural language prompts are becoming the primary interface between humans and machines. This shift introduces a new dilemma for software engineering: should we store these prompts alongside the code they generate, or discard them as transient artifacts once the code is written?

The Debate: Prompts in Code Review

The developer community is already grappling with this question. A recent poll by Gergely Orosz asked developers if they wanted to see the prompts that generated a pull request. The results were polarizing: nearly half (49%) were enthusiastic about the idea, while a quarter (24%) strongly opposed it. Meanwhile, the industry is moving ahead: tools like Cursor (which acquired Graphite) and internal initiatives at Meta are actively building workflows to publish and review prompts as part of the code.

We are in the early days of defining the norms for this AI-assisted reality.

Redefining "Source Code"

Traditionally, the distinction was clear: source code is what humans write, and machine code is what computers execute.

Human Code Build Artifact

For the end user, only the build matters. They download binaries or visit a website. They don't care about the source code, nor should they. Yet, source code is the essential input for development—sufficient to reproduce the build deterministically.

With vibe coding, the workflow changes. We translate natural language (the prompt) into a programming language (the code). If prompts are the true "source" of our intent, should we commit them instead of the Python, TypeScript, or Rust they generate? It's tempting to cut out the middleman and treat our instructions as the only source of truth. But that approach has significant flaws.

Building code must be deterministic. Code that only compiles when the stars align is useless. Modern software engineering relies on reproducibility—pinned dependencies, containerized environments, and CI/CD pipelines ensure that if it builds on my machine, it builds on yours.

In contrast, generating code from prompts is inherently non-deterministic and difficult to replicate:

  • Probabilistic Nature: Even with temperature=0, most LLM APIs do not guarantee identical outputs for identical inputs across runs. Determinism in large language models remains an unsolved research problem.
  • Transient Models: Models are updated silently or deprecated entirely. Unlike a specific version of a library in package.json, we cannot rely on "GPT-4-snapshot-2025" existing forever.
  • Context Handling: LLMs don't just read the prompt; they consume rich context from the IDE—open files, conversation history, memory, and tool outputs. Capturing this entire state for reproducibility is incredibly complex.

Even simple tasks yield different results. I ran four parallel instances of Gemini 3 Pro in Cursor with the identical prompt: "Correct grammar in this post." Each instance produced a slightly different variation. In a large codebase, such variance could mean fixing a bug one day and reintroducing it the next.

Prompts as Intent, Not Specification

Prompts are best understood as a form of specification, but a fuzzy one. They leave significant room for interpretation.

Natural language is ambiguous—a feature for human communication, but a bug for precise instruction. Even when prompts are detailed, there is often a gap between what we ask for and what the model delivers2. Current LLMs, while impressive, are far from infallible. They can misinterpret instructions that would be crystal clear to a junior developer.

Therefore, prompts should be treated as intentions and context notes from the development process, rather than a reliable build input.

The Case for "Git Blame" on Prompts

I believe all AI contributions—both code changes and commit messages—should be clearly attributed. This isn't about devaluing AI work, but about providing essential context for troubleshooting and maintenance. As open-source projects increasingly require disclosure of AI contributions3, this transparency becomes standard practice.

It is crucial to differentiate between:

  • What was consciously intended by the developer.
  • What was a specific design decision.
  • What just "happened" because the model hallucinated or took a shortcut.

Tracking prompts alongside code commits offers several benefits:

  • Accelerated Learning: The field of AI coding is moving at breakneck speed. Seeing how peers prompt models allows us to learn new techniques and improve our own workflows—a rising tide lifts all boats.
  • Intent Verification: When a bug appears, reading the prompt can reveal why the code was written that way. Was the logic flawed, or did the model misunderstand?
  • Efficient Reviewing: Knowing a commit is AI-generated signals reviewers to look closer at specific areas. For instance, we might trust AI with boilerplate UI code but demand human-level scrutiny for authentication logic.

Challenges and Reservations

Despite the benefits, archiving prompts introduces human friction. Writing prompts is often a messy, creative process:

Technological Hurdles

  • Dirty Notebooks: Prompts are often stream-of-consciousness, full of typos and unfinished thoughts.
  • Privacy Risks: They might accidentally contain API keys, passwords, or PII.

Social Factors

  • Profanity & Frustration: We are often less polite to AI than colleagues (frustration is real!).
  • "Imposter Syndrome": Using AI can sometimes feel less "earned," leading developers to hide their usage.
  • Peer Pressure: Fear of rejection due to "AI Slop" stigma.

To address this, we need tooling with redaction and curation capabilities. Just as we squash messy commits before merging to main, we should be able to curate our prompts into a clean, public history.

Conclusion

Code reviews are evolving, and the integration of AI is inevitable. We need standards—like MCP and SKILL.md—to govern how we share prompts alongside our git history.

In the meantime, start simple: if you use AI to write the code, use AI to write the commit message. And if you're feeling brave, include the prompt that started it all.

It is frustrating to see dozens of AI-generated files committed with a lazy "fixed it" message. We deserve better. If a tool allows vibe coding, it must also enable vibe committing.