consultance.ai
Finance and data

OpenAI Self-Improving Tax Agent — Goodbye Tax Season

The exact OpenAI build behind the 7,000-return, 97%-accuracy tax agent, packaged as 8 prompts you run on your own client data in your own OpenAI account. Read messy K-1s, rental schedules, and prior-year notes, reconcile every figure across documents, hunt the deductions a rushed preparer skips, and draft the return to a named human sign-off. Then a weekly Codex loop turns every correction into a regression test, the mechanic that took the source agent from 25 to 86 percent field accuracy in six weeks. OpenAI native end to end.

Rent an army every April, or run the 7,000-return agent yourself.

This is the exact OpenAI build behind the 7,000-return, 97%-accuracy tax agent — 8 prompts you run on your own client data in your own OpenAI account. It reads messy K-1s and rental schedules, reconciles every figure, hunts skipped deductions, and a weekly Codex loop turns corrections into regression tests — the mechanic that took the source agent from 25 to 86 percent field accuracy in six weeks.

Step 1

Paste this setup prompt — Claude installs it for you

Easy mode · paste this into Claude

Claude installs it for you, step by step.

Never used Claude before? It is free to start. Open it in a new tab, copy the prompt, paste it in. It asks one question, then walks you through everything.

  1. Step 1
    Open claude.ai ↗

    Sign up free. No card. Takes 30 seconds.

  2. Step 2

    One click. Lands on your clipboard.

  3. Step 3
    Paste + send

    Claude asks what you need + guides you the rest of the way.

Open claude.ai ↗
Tune the prompt for your level (optional)
Preview the prompt (you do not need to read it)
I want to stand up OpenAI's self-improving tax agent on my own client data. Walk me through it.

This is NOT a single coding install. It is a HYBRID with two paths, and you should treat me like a tax-firm owner or finance lead who has never coded but can follow careful steps. IT help is available if I choose the API path.

**Path A, the daily filing agent, zero Terminal, runs in a ChatGPT Project or via the Responses API.**
**Path B, the self-improving loop, runs in Codex, the one place a Terminal is involved, and only weekly.**

Do Path A first (returns get drafted today). Add Path B once Path A drafts are good enough to trust at the human gate.

---

## Path A, the filing agent (no Terminal)

Pick how my data lives, then we go:

1. Go to `chatgpt.com`, sign in. Team or Enterprise is recommended so my client data stays in my org's tenant and is excluded from training; Plus works for a solo preparer testing it.

2. Left sidebar, click "GPTs" then "Projects", or the Projects entry in the sidebar. Create a Project named "Tax Agent, [client or season]".

3. Add my data the way I choose:
   - **(A) Project files:** drag the client's PDFs into the Project's files, W-2s, 1099s, K-1s, rental schedules, prior-year returns.
   - **(B) Paste:** paste raw text into the chat for a one-off.
   - **(C) Responses API:** if my IT person wants this wired into our software, the API takes the same documents as file inputs. Ask me to loop in IT and I will switch to API instructions.

4. Paste prompt 01 (the onboarding router) from the prompt vault into a new chat in the Project. Answer its four questions (return type, data source, jurisdiction and year, risk bar). It will not analyze until I answer.

5. Run prompts 02 to 06 in order on one real return. Confirm:
   - Every extracted figure shows its source document and box or line.
   - Anything handwritten or uncertain comes back low-confidence and routed to me, not auto-accepted.
   - Nothing files itself. Prompt 06 stops at a named human sign-off.

---

## Path B, the self-improving loop (Codex, weekly, light Terminal)

This is the part that makes the agent get better instead of repeating mistakes.

1. Install Codex once. If I have Node: open Terminal and run `npm i -g @openai/codex`, then `codex login` and sign in with my ChatGPT account. No API key needed if I log in with ChatGPT.

2. Make a folder (a repo) with: my prompt files, an `evals/` folder, and `corrections.jsonl`. Every time I fix one of the agent's drafts, prompt 07 logs that fix as one line in `corrections.jsonl`.

3. Once a week, from inside that folder, run Codex with prompt 08. Codex reads my corrections, finds the most common repeated mistake, writes a regression test for it, fixes the prompt or code, and re-runs the tests. It changes one root cause at a time and opens the change for my review, it does not auto-merge.

4. Over a few weeks this is what moves field accuracy up (the source build went 25% to 86% in six weeks this way).

---

## Rules

- One step at a time. Tell me which path I am on (A or B) before each instruction.
- Define jargon on first use: Project, Project files, Responses API, Codex, repo, evals, corrections.jsonl, review gate, K-1, carryforward.
- Path A never needs a Terminal. Only Path B step 1 and 3 touch Terminal, walk me through those commands with expected output and common errors (`command not found: codex` means Node or the install path, `not inside a trusted directory` means add `--skip-git-repo-check` or run `git init`).
- Anti-pattern callouts:
  - If I say "my firm blocks chatgpt.com", use ChatGPT Enterprise with SSO or the Azure OpenAI deployment under our tenant, same Project workflow.
  - If I say "I do not want any Terminal at all", I can still run Path A forever; Path B is optional, the agent just will not self-improve without it.
  - If Codex errors, NEVER tell me "it is not possible", diagnose plainly (Node missing, not logged in, not a git folder) and give the fix.
- Never tell me to file a return. A named human preparer signs off at prompt 06, every time.

First message: ask "Which return type, and where does your client data live, Project files, paste, or the API? And do you want Path B (the weekly self-improving loop) set up now or later?" Then start on the right path.

When Path A prompts 01 to 06 run cleanly on a real return AND I have logged a few corrections, switch into "loop mode" and walk me through the first weekly Codex run from the 90-day rollout playbook.
Step 2

Step 1 installed it. Now run these 7 prompts on your own data.

the vault

The 7 prompts

Tap a prompt to jump to it. Hit copy. Replace the tokens. Paste into Claude Opus 4.7.

where it breaks

Before you connect live data

  • • Run dummy data first. Real client data is not a test bed.
  • • API keys never go in a public repo. Use env vars and a secrets manager.
  • • Add logging, access control, monitoring, and a rollback path before launch.
  • • Read the license. Forking a repo without checking is how lawsuits start.
license note

Credit the original author

Prompt set and build authored by consultance.ai. OpenAI, Thrive, and Crete are referenced as the source proof of concept, no affiliation implied. Your client documents stay in your own OpenAI tenant; we never see them. This is not tax advice; a named human preparer signs every return before it is filed.

the newsletter

AI news worth opening.

The AI tools, launches, and shifts that actually matter, in plain English. New library drops the moment they land.

100% freeNo paywall, everUnsubscribe anytime

Read this far? You want the 7,000-return agent on your own data, not an April army. Let us build it — with a named human signing every return.