/2026/01/12/codex-notes

Notes on Using Codex

2026-01-12

Codex becomes much more useful when I stop treating it like a one-off chatbot and start treating it like a teammate that can be configured, evaluated, and improved over time.

This note is based on OpenAI’s Codex best practices and my own workflow building this blog. The most important lesson is simple: Codex does better work when it has the right task context, a clear definition of done, reusable project guidance, and permission to verify its own output.

1. Start With the Right Task Shape

A good Codex request should describe four things:

the goal
the relevant context
the constraints
what counts as done

I like using this prompt shape:

Goal:
Update the Codex article with a more practical workflow.

Context:
- The post is source/_posts/codex-notes.md.
- The theme supports copyable code blocks.
- Follow .spec/content-editing.md.

Constraints:
- Keep the article in English.
- Do not change unrelated pages.
- Preserve the existing date and tags.

Done when:
- npm run clean && npm run build passes.
- The generated article contains code examples.
- The change is committed.

The value of this format is not ceremony. It removes ambiguity. Codex can search less randomly, make fewer assumptions, and finish with a concrete verification step.

2. Use Planning for Fuzzy Work

When a task is complex, unclear, or likely to touch multiple files, I do not ask Codex to jump straight into editing. I ask it to plan first.

Good planning requests sound like this:

Inspect the current implementation first.
Do not edit files yet.
Explain which files are involved and propose a short plan.
Call out risks before implementation.

This is useful when working on layout, navigation, deployment, or anything that can regress existing behavior. For this blog, a good example is the audio player: changing page navigation can accidentally recreate the <audio> node and restart the music. Planning helps catch that before code changes happen.

For bigger work, I also like asking Codex to interview me:

I have a rough idea but not a precise spec.
Ask me the minimum questions needed to turn it into an implementation plan.
Challenge unclear assumptions.

That keeps the human responsible for product direction while still using Codex to make the task concrete.

3. Put Durable Rules in `AGENTS.md`

Repeated instructions should not live in every prompt. They should live in project guidance.

For this repository, AGENTS.md records rules like:

# Agent Instructions

- After each completed small feature loop, create a Git commit.
- Keep commits scoped to the completed work.
- After development, run relevant tests or checks yourself.
- If testing reveals problems, fix them and repeat verification until the checks pass.
- Before making changes in a fresh context, read `.spec/README.md` first.

This makes Codex more consistent across sessions. The goal is not to write a huge instruction file. A short, accurate file is better than a long document full of vague preferences.

My current pattern is:

AGENTS.md
  high-level workflow rules

.spec/README.md
  progressive disclosure entry point

.spec/content-editing.md
  how to edit posts, about page, projects, music, favicon, and ICP text

.spec/theme-guidelines.md
  visual and interaction constraints

.spec/deployment.md
  Codeup Flow, Nginx, and deployment checks

This lets a new Codex session load only the guidance it needs instead of reading every document in the repo.

4. Configure Codex for the Real Environment

Many bad results are not model problems. They are setup problems: wrong working directory, missing permissions, missing tools, missing context, or unclear verification commands.

I want Codex to know the real project commands:

npm run clean
npm run build
node --check themes/kaomoji-pixel/source/js/home.js
git status --short

I also want it to know how much freedom it has. For trusted personal projects, I may allow broader local edits. For unfamiliar codebases, I prefer tighter approvals and smaller tasks.

The useful configuration layers are:

~/.codex/config.toml
  personal defaults

.codex/config.toml
  repo-specific settings

AGENTS.md
  human-readable project rules

The important point is to configure Codex around the workflow I actually use, not around an idealized workflow I will forget to follow.

5. Make Testing and Review Part of the Request

Codex should not stop after editing files. It should verify the change.

For this blog, a normal content change should end with:

npm run clean && npm run build

For JavaScript behavior changes, it should also run:

node --check themes/kaomoji-pixel/source/js/home.js

For deployment changes, I want a local simulation or a command-level equivalent when possible.

I also like asking Codex to review its own diff:

Before committing, review the diff for:
- unrelated changes
- broken responsive layout
- missing verification
- stale documentation

This does not replace human review, but it catches many simple mistakes.

6. Use MCP for Context Outside the Repo

Some context changes too often to paste into a prompt: docs, dashboards, tickets, deployment systems, external APIs, and internal tools. MCP is useful when Codex needs that context repeatedly.

A simple CLI shape looks like this:

codex mcp add openaiDeveloperDocs --url https://developers.openai.com/mcp

I do not want to connect every possible tool immediately. A better rule is: add an MCP server only when it removes a real manual loop.

Good MCP candidates:

current product documentation
issue trackers
CI status
deployment metadata
internal search
observability tools

Bad MCP candidates are tools I might use once and never touch again.

7. Turn Repeated Work Into Skills

If I keep giving Codex the same long prompt, that workflow probably wants to become a skill.

Good skill candidates are narrow and repeatable:

writing release notes
reviewing a PR against a checklist
triaging logs
planning a migration
summarizing recent commits
applying a standard debugging flow

A good skill has a clear trigger, a small scope, and a concrete output. I would rather have three small reliable skills than one giant skill that tries to cover every case.

The pattern I like:

1. Repeat the workflow manually.
2. Notice which instructions keep coming back.
3. Extract those instructions into a skill.
4. Test it on one real task.
5. Improve only after it proves useful.

8. Automate Only Stable Workflows

Automation is powerful only after the manual workflow is reliable. If I still need to steer Codex heavily, the workflow is not ready for automation.

Good automation candidates:

checking CI failures
drafting release notes
summarizing recent commits
scanning for common bugs
generating maintenance reports

The distinction I like is:

skills define the method
automations define the schedule

If the method is not stable, scheduling it only creates repeated noise.

9. Manage Sessions Deliberately

Long Codex sessions collect context, decisions, and mistakes. That can help, but it can also become messy.

For me, the practical rule is one thread per coherent task. If the task changes direction, I prefer starting a new thread or forking the conversation.

Useful session commands include:

/status
/resume
/compact
/fork
/plan

I use /compact when the context is getting long but the task is still the same. I use /fork when the work branches into a different direction.

10. Common Mistakes I Want to Avoid

The mistakes I have to watch for are predictable:

giving Codex a vague goal and expecting precise output
keeping durable rules in prompts instead of AGENTS.md
skipping planning on multi-file changes
not giving Codex the commands needed to verify work
allowing broad permissions before understanding the workflow
running multiple active agents on the same files without coordination
automating a workflow before it works manually
using one endless thread for every task in a project

Most of these are not about intelligence. They are about process.

My Current Codex Workflow

For this blog, the workflow I want Codex to follow is:

1. Read .spec/README.md.
2. Read only the relevant spec files.
3. Inspect the existing code before editing.
4. Make a narrow change.
5. Run the relevant checks.
6. Fix failures.
7. Commit the completed loop.
8. Push when the task should deploy.

That workflow makes Codex feel less like a text generator and more like a disciplined development loop.

Source

This article summarizes and adapts ideas from OpenAI’s Codex best practices guide:

https://developers.openai.com/codex/learn/best-practices