A new feature in our team goes through eight stops on its way to production. AI is in the loop on three of them. The other five are pure engineering judgement. This post walks the full pipeline using a real example: a "saved search" feature on a B2B SaaS dashboard.
The pipeline
Eight stops from spec to production
Total time for the example feature: 1.5 working days. Same feature without AI scaffolding: 3 to 4 days.
Step 1: Product spec
Product manager writes a one-page spec. What "saved searches" means, who can save them, the limits (max 50 per user), the URL behaviour. No engineering content yet.
Step 2: Engineer writes brief (5 min)
The engineer translates the product spec into an engineering brief:
Add a "Saved Searches" feature to the dashboard.
Files to add:
app/api/saved-searches/route.ts GET (list) + POST (create)
app/api/saved-searches/[id]/route.ts DELETE
components/saved-searches/SavedSearchList.tsx
prisma/schema.prisma add SavedSearch model
Acceptance:
- User can save a search with a name. Max 50 per user.
- GET returns the user's saved searches.
- DELETE removes one. Auth required on every endpoint.
- Tests cover the auth boundary and the 50-item limit.
Constraints:
- Use existing auth middleware in lib/auth.ts.
- Match the patterns in app/api/users/route.ts.
- No new dependencies.
This brief is the most important artefact in the workflow.
Step 3: AI drafts the change (15 to 60 min)
The engineer runs Claude Code with the brief. The tool:
- Reads the existing patterns (auth middleware, the users API).
- Drafts the new files.
- Updates the Prisma schema.
- Generates a Playwright spec for the happy path.
The output is a complete, runnable change. The engineer didn't write any code yet. They wrote a brief and watched the diff appear.
Step 4: Engineer reviews and tightens (30 to 120 min)
This is the unskippable step. The engineer:
- Reads every line of the diff.
- Notices the AI used
findFirstwherefindUniqueis correct. Fixes it. - Notices the test only covers the happy path. Adds two failure-case tests.
- Notices the AI added a
descriptionfield to the schema that wasn't in the brief. Removes it. - Runs the tests locally. They pass.
Two patterns we see repeatedly:
- AI tends to over-deliver. Extra fields, extra error handling, extra abstractions. The reviewer's job is to trim.
- AI tests test what the AI built. A real reviewer adds tests that test what the brief asked for.
Step 5: Local test and edge case pass
Engineer runs the change end-to-end in the local environment. Logs in as a real test user. Creates 50 saved searches. Tries to create a 51st. Confirms it fails. Tries with a stale auth token. Confirms it fails.
The edge case pass takes maybe 15 minutes. It's the cheapest insurance in the pipeline.
Step 6: PR opened, CI runs
Engineer opens a PR. The brief becomes the PR description. CI runs:
- Type-check.
- Lint.
- Unit tests.
- E2E tests on Playwright.
- Security scan (Semgrep, dependency scan).
- Coverage check.
Everything green. If anything goes red, the engineer fixes locally and re-pushes.
Step 7: Code review
A teammate reviews the PR. Same standard as any other change. Questions like:
- Does this match what the brief asked for?
- Are the tests real or theatrical?
- Did we introduce any new dependencies? (No.)
- Is there anything weird?
The reviewer doesn't need to know which lines were AI-generated. The code is the code.
Step 8: Merge, deploy, observe
PR merged. Deploy pipeline kicks in. Feature flag enabled for the engineering team only. Engineer watches the dashboards for 15 minutes:
- No new error rate.
- No new latency.
- Endpoint hit count looks right.
Flag rolled out to 10 percent of customers, then 100 percent the next day after no issues.
Why this works
The pipeline works because every fast stage is preceded or followed by a slow stage with a human in it. AI compresses the slow part of scaffolding from a day to an hour. The review, the edge case pass, and the rollout are still the same speed they always were. Total feature time drops by 50 to 70 percent on the right kinds of features.
The compression is in scaffolding, not in judgement
When teams ask why their AI workflow isn't 5× faster, the answer is almost always that the compression is concentrated in Steps 3 and a piece of Step 4. The product spec, the brief, the review, the PR review, and the observation still take a human as long as they always did. If you've removed time from those, you've removed safety, not noise.
When this workflow doesn't fit
- Pure exploration. When you don't know what you're building yet, the brief step is impossible. Use AI for one-line scratchpad work, not for end-to-end features.
- Highly coupled systems. When a change touches 15 files in unpredictable ways, AI can't hold the context. Break the work down first.
- Compliance-heavy code. PCI-DSS or HIPAA-scoped code needs deeper review than the standard pipeline. We slow it down deliberately.
How Hashorn delivers using this pipeline
Hashorn uses this exact workflow for AI software development engagements. We bring our brief format, our review discipline, and our CI patterns. For startups we run short MVP engagements; for longer relationships we run dedicated teams that embed in your sprint.
Conclusion
Prompt-to-production in 2026 looks a lot like ship-to-production in 2024, with one massive difference: the scaffolding step that used to be a day is now an hour. The teams that benefit are the ones that didn't lower their review or testing standards to capture that hour. Velocity goes up. Quality stays the same. That's the goal.
Frequently asked questions
Need help building AI-powered software, QA automation, or secure cloud systems?
Talk to Hashorn's engineering team. Dedicated senior engineers, QA, and security with same-week ramp.