How I use AI to turn Jira epics into Playwright tests
I’m a senior frontend developer, and at some point I offered my help to the QA team with our Playwright automation repository. The idea was to improve coverage, make the repo more reliable, and share knowledge across roles instead of leaving test automation in its own isolated corner.
As I started doing that work, I developed my own workflow.
When a new feature arrives, the hard part is usually not writing locator.click() or asserting a heading. The real friction starts earlier: understanding the feature, extracting the actual business rules from Jira, deciding what deserves coverage, and turning scattered product language into something implementable.
AI has helped me a lot here, but not in the shallow “write this test for me” sense.
The real value has come from using AI to reshape the workflow around test creation: first establish stable context, then derive feature-specific context, then plan the coverage, then implement in iterations, then validate aggressively. I use AI less as a magic coder and more as a layered engineering assistant.
As I started doing that work, I developed my own workflow.
When a new feature arrives, the hard part is usually not writing locator.click() or asserting a heading. The real friction starts earlier: understanding the feature, extracting the actual business rules from Jira, deciding what deserves coverage, and turning scattered product language into something implementable.
AI has helped me a lot here, but not in the shallow “write this test for me” sense.
The real value has come from using AI to reshape the workflow around test creation: first establish stable context, then derive feature-specific context, then plan the coverage, then implement in iterations, then validate aggressively. I use AI less as a magic coder and more as a layered engineering assistant.
The problem with starting from code
A Jira Epic is rarely implementation-ready.
It may contain the truth, but not in a form that is useful for writing robust tests. Acceptance criteria are often scattered, repetitive, partially implicit, or mixed with business assumptions that live in people’s heads. If I throw that directly into a coding model and ask for Playwright tests, I’m basically asking it to guess. I even did it at the beginning to see how good it was as guessing... Happily AI tools are not that good yet.
So starting that way was a mistake. The coding part is not the first step. It is the fourth.
Instead of jumping straight into implementation, I break the work into artifacts that are reusable and easier for both humans and models to reason about.
My workflow starts before implementation
When I need to cover a new feature, I usually begin in Jira and collect the acceptance criteria from all the tickets in the Epic (So yes, having even poor written tickets is a pre-requisite from this kind of flow). But I do not start by creating a feature document right away.
First, I create the most reusable artifact in the whole workflow: the general business-rules context.
Only after that do I create the feature-specific document for the new functionality I’m covering. Then I ask for a test plan ordered from simple scenarios to more complex ones.
So the sequence looks like this:
1. a general business-rules context document
2. a feature-specific context document
3. a test plan ordered from simple scenarios to complex ones
That order matters because the first document has long-term value, while the second one is tied to a specific feature.
1. I start with a general business-rules context document
The first artifact I create is the broadest one. I ask the AI to help me write a document that explains the business domain: what kind of company this is, what market it serves, what kinds of users exist in the system, and what rules shape how the product behaves. Some of the may come from onboarding docs, some from the institutional sites, some other from knowledge I gathered along the way.
For example, if the company is an e-commerce business in the US market, that matters. If there are different user types with different permissions, flows, or expectations, that matters too. A test is never just about a button and an assertion. It exists inside a business model.
This document becomes valuable because it is reusable. I can carry it into the next feature, and the next one after that, instead of re-explaining the same domain context every time. That makes it one of the highest-leverage artifacts in the process.
This is one of the first places where AI becomes genuinely useful: it helps turn repeated background explanation into a stable document I can keep reusing across iterations.
This context is always work in progress and can (and should) be refined or expanded. If it grows too much, I found useful to split it in more specific markdown files, all linked by an index kind of file.
2. Then I create the feature-specific context document
Once the business-rules context exists, I move to the feature itself.
Now I take the acceptance criteria from the tickets included in the Epic and ask the AI to rewrite them into a cleaner feature-specific document: what the feature does, what rules govern it, what behaviors are conditional, what edge cases are implied, and what states matter from a testing perspective.
This document is more local and more temporary than the business-rules document. It is created when I cover a new feature, and it is shaped by the specifics of that particular Epic.
That distinction matters. The business-rules context is the stable layer. The feature-specific document is the situational layer.
Jira is written for coordination, not for execution. A feature document gives me a stable working view of the feature so I can reason about it more clearly and reduce the chance that I or the implementation model drift away from the actual intent.
3. Then I ask for a test plan designed for machine handoff
Once I have both context artifacts, I ask the AI to create a test plan. Not a vague checklist. A real test plan.
I want scenarios, coverage ideas, dependencies, and ordering. The idea is to cover at least the critical workflows of the new feature. I also want the plan written clearly enough that another LLM can consume it without ambiguity. So I explicitly tell ChatGPT, DeepSeek or the one I use for this planning to write it in a LLM fashion). That matters because later I may pass it to Claude, Codex, or another implementation-focused model.
One detail has become especially important in how I do this: I ask the AI to order the tests from easier to harder. That is not cosmetic. It changes the implementation dynamic.
Starting with easier scenarios gives me momentum. It helps patterns emerge early. It reveals which helpers, fixtures, selectors, or utilities will probably be reused later. It also lets me validate my understanding before I get into the most complex states and transitions. And yes, it is also about motivation: seeing progress make you feel stronger and when the complex tests time arrives, you are more resilient when It comes to think harder/fix them.
Sidenote about the LLMs I use for these first 3 steps: I don't jump into Claude/Codex yet. I'm using ChatGPT, Qwen or DeepSeek for this matters. Then, for the implementation, I do go to the other 2. It's also a kind of experiment to see each of them strengths and weaknesses.
4. Only then do I hand it to the coding model
At that point I have a much better foundation:
- A general business-rules context document
- A feature-specific context document
- A test plan ordered by implementation difficulty
Now I hand those artifacts to the implementation model I want to use. As I said: Sometimes Claude. Sometimes Codex. The tool is secondary. The prepared frame is the important part.
This is the key distinction in my workflow: I do not ask the coding model to figure out the feature from scratch. I ask it to implement inside a context I have already shaped.
That makes the interaction much more disciplined.
Instead of gambling on a giant prompt, I iterate from the test plan. I take the easier scenarios first, let the model implement them, review what it did, and then move on to the next layer of difficulty. The model is no longer doing discovery, planning, implementation, and validation all at once. It is doing a narrower job inside a defined structure.
That is why the results are better.
5. After each test, I run the suite
This part matters because it keeps the whole thing honest.
After each implemented test, I run the Playwright suite to validate that the test is actually working. Not at the end. Not after six speculative tests. After each one.
That tight feedback loop is essential because it prevents silent drift. A model may misunderstand a selector, make a bad assumption about state, overfit to a brittle flow, or accidentally encode something that looks plausible but is wrong. The sooner I catch that, the cheaper the correction.
AI helps me produce faster drafts. Execution still decides what is real.
So the workflow is not “generate a batch of tests and pray.” It is closer to this: establish context, specialize context, plan, implement, validate, repeat.
That is a much healthier loop.
6. At the end, I commit and document what the feature needs
Once the test implementation is complete and stable, I do the normal engineering work: create the commit, clean up what needs cleanup, and add supporting documentation when necessary.
Sometimes that documentation includes data setup or mocks needed to exercise the feature correctly. That part is easy to neglect, but it matters for maintainability. A green test is not enough if nobody understands what conditions are required to keep it green.
Because the context was structured early, the closing documentation also tends to come out better. The work is less chaotic from beginning to end.
Why this has helped in my collaboration with QA
One thing I care about in this process is that it is not just about my personal productivity.
When I offered my help to the QA team in the automation repo, I wasn’t trying to pretend I had become a full-time test engineer. I wanted to contribute something useful: stronger coverage, a cleaner workflow, and a more shareable way of reasoning about automated tests.
That is another reason I like this AI-assisted approach: it externalizes reasoning.
The business-rules document, the feature-specific context, and the test plan are not just prompts. They are shared artifacts. They make the work more legible. They reduce hidden context. They help future iterations start from something better than memory and tribal knowledge.
That matters more than shaving off a few minutes of implementation time.
What AI is actually doing for me
AI is helping at several layers of the workflow:
- distilling business context into a reusable artifact
- turning Epic acceptance criteria into feature-specific working context
- converting both into an ordered test plan
- accelerating implementation once the frame is clear
- supporting iteration when reality pushes back
But I don’t think the lesson here is “AI writes my Playwright tests.”
That’s too simplistic.
The real lesson is that AI becomes much more useful when I stop treating it as a one-shot generator and start treating it as part of a structured engineering loop. The better I define the artifacts and boundaries, the better the outputs become.
So no, AI did not replace my testing work.
It changed the shape of it.
I'm starting to work on another article article about the Claude/Codex customization (prompts, skills, agents, etc) I use in this day by day workflow.