Joshua Adeyemi
Joshua Adeyemi

9 Strategies to Get the Most Out of Playwright Test Agents

Learn how to get the most out of Playwright Test Agents and improve your test suite using new AI solutions.

9 Strategies to Get the Most Out of Playwright Test Agents

AI is transforming how teams write and maintain tests. Instead of manually authoring every test case from scratch, developers now collaborate with AI agents that understand testing patterns and can generate, debug, and repair tests automatically. To maximize this shift, teams need to equip their AI agents with testing expertise—and understand how to use built-in testing agents effectively.

Playwright Test Agents reduce the repetitive overhead of maintaining large test suites. As applications change, issues like locator drift, trial-and-error debugging, and manual test authoring slow teams down. Agents accelerate routine test creation, improve locator hygiene, and reduce time spent fixing broken tests.

Introduced in Playwright version 1.56 (October, 2025), agents include three components: 🎭 Planner, 🎭 Generator, and 🎭 Healer. Together, they shift test creation from manual scaffolding to guided, application-aware workflows.

But agents are not autonomous testers. They amplify the quality of the structure they are given. Clear fixtures, helpers, and conventions reinforce good practices, while poorly structured suites reproduce the same issues.

Teams get the most value when agents are used intentionally, within defined boundaries, and alongside solid engineering practices. This article presents nine strategies for using Playwright Test Agents effectively, highlighting real-world patterns of what works, what fails, and why.

Why Playwright Test Agents Are a Big Deal: The Real Capabilities (and Limits)

Understanding what each agent does reveals where they add value and where they do not.

The 🎭 Planner interprets intended actions based on the current page and application state. Given a goal, such as testing a guest checkout flow, it explores the UI and generates a structured, step-by-step plan. This helps teams navigate complex or unfamiliar interfaces, where identifying the correct sequence of interactions can otherwise take considerable time.

The 🎭 Generator converts these plans into executable Playwright test code. It follows patterns from your seed files, including fixtures, helpers, and setup logic, and verifies that selectors exist in the running application while generating code. This produces a test structure that matches team conventions rather than generating tests that don’t follow established patterns.

The 🎭 Healer focuses on keeping tests up-to-date. When a test fails, it replays the steps, checks the current UI, and suggests fixes for broken interactions. For example, a fragile text-based selector might be replaced with a more reliable role- or attribute-based locator. You can see a demonstration of these agents in action below:

By checking selectors during generation and repair, teams catch flaky locators and timing issues before they cause CI failures. This reduces false negatives and unnecessary reruns, making test suites more dependable.

Agents do have limitations. They cannot understand business logic or define test oracles. They can interact with a “Submit Order” button, but they cannot determine whether the correct backend side effects occurred unless those checks are explicitly defined. Agents also cannot reason about complex stateful workflows involving backend setup or domain rules.

Finally, agents cannot fix unstable environments or replace proper test architecture. Timing drift, async rendering issues, and poor test structure are inherited, not solved. Agents improve the quality of the patterns they are given, reinforcing good practices or reproducing existing problems. To get consistent value, teams need clear structure and guidance around how agents are used.

9 Strategies to Get the Most Out of Playwright Test Agents

Understanding what agents can do doesn’t automatically mean they’ll be used effectively. Without proper guidance, generated tests can become harder to maintain. Using agents well requires planning and structured workflows. These nine strategies show how teams get consistent results.

1. Establish Strong Test Architecture Before Letting Agents Write Code

Agents copy what they see. They pick up on existing structure, naming, and patterns in a test suite. If the suite is inconsistent or loosely organized, agent-generated tests often repeat those same problems. When patterns are clear and predictable, the generated output is usually easier to maintain.

Strong architecture gives agents clearer signals to follow. This starts with clean, reusable fixture patterns that consistently handle standard setup. Authentication works best when implemented once through a deterministic login helper rather than repeated across individual tests. Repeated interactions benefit from page objects or component-level abstractions, and stable data states help reduce brittle assumptions.

The seed file is especially important because it acts as the primary reference that agents learn from. If the seed demonstrates proper fixture usage, agents tend to generate tests that rely on fixtures. If it shows a stable data setup and resilient locators, those patterns are more likely to carry forward.

Example of a solid seed file:

import { test, expect } from "./fixtures";

test("seed - basic navigation", async ({ page, authenticatedUser }) => {
  await page.goto("/dashboard");
  await expect(page.getByRole("heading", { name: "Dashboard" })).toBeVisible();
});

This seed uses a custom authentication fixture, navigates to a known state, and validates the UI with role-based locators. Tests generated from this example tend to follow the same structure and conventions.

Contrast with a weaker seed:

import { test, expect } from "@playwright/test";

test("seed", async ({ page }) => {
  await page.goto("https://staging.example.com/login");
  await page.locator("#username").fill("admin@test.com");
  await page.locator("#password").fill("password123");
  await page.locator('button[type="submit"]').click();
  await page.waitForURL("**/dashboard");
});

This seed hardcodes URLs, uses fragile selectors, and handles authentication directly in the test. Agents generated from this example tend to repeat these patterns across the suite.

2. Define Locator Strategy and Selector Hygiene Early

Locator failures account for a significant portion of test maintenance time. When a CSS class changes or an ID is removed, tests can fail. Teams may spend considerable time updating selectors that could have been more stable.

Agents can replicate fragile selectors if conventions are not defined. Without clear rules, agents may choose text selectors, nth-child patterns, or long CSS chains that work initially but can fail after minor UI changes.

Define your strategy before generating tests. Establish a hierarchy of preferred selectors. Playwright recommends using role-based selectors first, then test IDs, and finally other semantic selectors. CSS and XPath should generally be used only as a last resort.

A commonly used order looks like this:

  • Role-based selectors: getByRole('button', { name: 'Submit' })
  • Test ID selectors: getByTestId('checkout-button')
  • Label-based selectors: getByLabel('Email address')
  • Text selectors (for specific content): getByText('Welcome back')
  • CSS selectors: only for unique, stable attributes

Avoid:

  • nth-child or positional selectors
  • Long CSS chains tied to the DOM structure
  • Selectors relying on temporary classes
  • Text matches on dynamic content

Document your selector conventions clearly. Include guidance in pull request templates and review agents-generated test against these conventions. Treat the seed file as the reference for locator strategy, since agents learn by copying what they see. If the seed consistently uses getByRole(), getByLabel(), and getByTestId(), and avoids raw CSS or nth-child patterns, new code will follow that pattern. Consistent locators reduce maintenance and improve test reliability.

3. Build a Human-in-the-Loop Review Cycle for Agent Suggestions

Agents behave like junior engineers. They write working code fast, but they don’t understand domain rules or recognize anti-patterns. Their output requires review before merging.

The problem arises when teams treat agent output as final. A test is generated, passes, and is committed, but over time it can add maintenance costs if it doesn’t follow team standards or validate the correct behavior.

Code review addresses this. Every agent-generated test should follow the same review process as human-written code, including pull requests, feedback, and iteration.

During review, check the following:

  • Architecture alignment: Does the test follow your patterns, use the correct fixtures, and align with your page object approach? If it bypasses helpers or repeats existing functionality, revise it.
  • Selector quality: Are locators stable and consistent with your defined hierarchy? Avoid nth-child selectors, complex CSS chains, or text matches on dynamic content.
  • Test logic: Does the test validate the intended behavior? Agents cannot interpret business rules, so reviewers should confirm that the elements checked demonstrate the correct functionality.
  • Scope creep: Does the test cover unrelated functionality? A login test should not also validate navigation, permissions, or profile display. Keep tests focused.
  • Silent drift: Watch for new patterns introduced by agents. If a generated test creates its own setup instead of using existing fixtures, it can fragment the codebase.

Establish clear review criteria. Provide a checklist for agent-generated tests and guide reviewers to spot common issues.

Example review comment: "This test uses page.locator('.submit-btn'), but we have a standard submitButton() method in BasePage. Please refactor to use the existing method."

Review feedback creates a loop that improves agent accuracy over time. When patterns are rejected consistently, update your rules, skills, or seed examples to prevent similar issues. Add enforcement through tooling: lint rules that ban raw CSS locators, CI checks that require Page Object Methods and directory conventions that keeps tests organized. The agent becomes more reliable when feedback turns into explicit guidance and automated constraints, not repeated rejections.

4. Use Agents as a Guided Onboarding Tool for New Contributors

Junior engineers and new team members face a learning curve with test automation. They need to understand your patterns, learn conventions, and get familiar with the domain, which can take several weeks.

Agents help reduce that timeline by showing patterns through generated examples. A junior engineer can watch the Generator create a test using proper fixtures and role-based locators, seeing the correct structure without reading lengthy documentation.

Here’s how agents can support onboarding effectively:

  • Pattern demonstration: New engineers see working examples right away. Instead of reading instructions like "use fixtures for authentication," they observe a generated test that imports fixtures correctly and follows the expected structure.

  • Convention reinforcement: Agents follow the patterns in your seed file. When new team members use agents, they receive feedback on what a good structure looks like. Agents generally do not generate tests that violate established patterns if the seed file is well-formed.

  • Reduced documentation burden: You do not need long testing guides. A good seed file and agent examples can teach more effectively than documentation alone.

  • Lower-stakes practice: Junior engineers can safely experiment with agents. They can generate tests, review them, and adjust parameters, learning without affecting production tests.

  • Structured onboarding example: A SaaS company onboarded new QA engineers using agents:

    • Day one: Seed file review and agent setup.
    • Day two: Use the Planner to explore the application and understand user flows.
    • Day three: Use the Generator to create tests and submit them for review.
    • By day five, new engineers can produce maintainable tests.

Teams should not use agents solely for speed. Pair junior engineers with agents, let them generate tests, review output, and understand why specific patterns exist. Knowledge transfer happens faster when examples are concrete and immediate.

5. Integrate Agents Across Your Daily Workflow, Not Just in Isolation

Teams often limit the value of agents when they treat them as short trials. They generate tests during a sprint, review the results, and decide whether to continue. This approach can overlook the longer-term benefits agents provide when embedded in everyday development work.

Agents are most effective when integrated into regular workflows. Their usefulness grows when applied throughout a feature’s lifecycle, not just in a single testing phase.

In practice, this involves using agents at multiple points in daily workflows:

Local development: Developers working on new features can use the Planner to explore work-in-progress UIs. The Planner highlights interaction paths and edge cases that might otherwise be missed. Developers can review a structured test plan that better reflects the feature.

Pull request reviews: During reviews, agents can help validate UI changes. The Generator can generate tests for new components or updated flows, allowing reviewers to assess test coverage alongside code changes rather than waiting for a separate QA cycle.

Debugging failures: When tests fail in CI, the Healer can help investigate. It replays failing steps and identifies whether issues are caused by product changes or outdated selectors, reducing time spent on initial triage.

Trace analysis: Agents can examine trace artifacts when failures occur. By reviewing screenshots and execution steps, they can suggest likely causes and possible fixes, helping teams move from failure detection to resolution more efficiently.

Iterative test building: The Planner can also be used during feature design. Teams can outline test coverage before implementation, use those plans to guide development, and update them as features change, regenerating tests when needed.

Used this way, agents become part of the team’s regular development process, supporting consistent test coverage instead of being applied only after issues arise.

6. Create Boundaries: Know When Not to Use Agents

Agents are effective for interaction-heavy testing, but they are not suited to every type of validation. Teams lose efficiency when agents are applied to problems that require domain understanding or controlled backend state.

Agents are typically less suitable in the following cases:

Domain-logic heavy tests: Tests that validate complex business rules require knowledge that agents do not have. An agent can navigate a pricing or tax flow, but it cannot determine whether the calculations comply with regulations or business policy. These tests are better designed and validated by domain experts.

Multi-step, stateful workflows: Flows that depend on specific backend state, seeded data, or coordinated service behavior exceed what agents can reason about. Agents interact with the UI but cannot verify whether databases, background jobs, or dependent services are in the correct state.

Unstable environments: In staging or preview environments with timing issues, partial rendering, or inconsistent data, agents inherit the same instability. While selector healing can help with UI changes, it cannot compensate for unreliable environments.

Backend orchestration and mocking: Tests that require database seeding, service mocks, or custom backend configuration still need explicit human setup. Agents can run steps once those conditions exist, but they do not define backend prerequisites or orchestration logic.

Here's how these limitations show up in practice: In insurance pricing workflows, agents can generate tests that navigate forms and submit inputs, but they cannot verify whether calculated premiums align with actuarial rules. These tests may pass while still missing incorrect pricing behavior.

Teams often address this by keeping a pricing validation manual and using agents primarily for navigation and interaction coverage. This helps clarify where agent-generated tests add value and where human validation is still required.

7. Provide Clean, Deterministic Data States

Agents rely on consistent environments. They explore applications, verify locators, and generate tests against running interfaces. If the application behaves unpredictably, agent-generated tests may be unreliable.

Agents are affected by:

  • Inconsistent test data: Differences in user permissions, profile states, or seeded data between runs can cause tests to fail unexpectedly.
  • Slow or partial rendering: Applications that render slowly or inconsistently can confuse agents, leading to unreliable test capture.
  • Unstable environments: Frequent restarts, brief service outages, or overlapping deployments in staging can introduce timing issues into tests.

Ways to keep data and environments consistent:

  • Factory endpoints: Use API endpoints to create test data on demand and start agent runs from known states.
  • Data isolation: Avoid shared staging data. Use test-specific datasets so each run starts from the same baseline.
  • Pre-seed before agent runs: Execute setup scripts to clear old data and seed new records.
  • Disable animations / consistent UI timings: Turn off UI transitions or ensure predictable rendering so agents capture stable elements.

8. Choose the Right LLM Model and Limit Agent Scope

LLM Model choice affects how reliably agents follow your conventions. Stronger models tend to respect Rules, Skills, and repository structure more consistently. Weaker or lower-cost models often drift from established patterns, such as generating inline selectors instead of Page Objects, skipping fixtures, or placing files incorrectly, which can lead to brittle code. Teams looking for predictable, maintainable output should consider LLM model selection early.

Test them directly: Run the same prompts across different models and compare results. Check whether generated tests follow your Page Object Model, use the correct fixtures, respect naming conventions, and include proper assertions. Quality varies significantly between models.

Our 🔥 take: Free / low-cost models generates more AI slop than anything, it's not worth it. We did some testing and our winners were Opus 4.6 and Codex 5.3. Only after switching to them, we got to see real results and how useful the agents can be.

Even with capable models, clear boundaries help limit unintended changes. Agents can generate and modify code, and without defined limits, they may change more than intended. A selector fix can affect logic, or a locator update can reshape test structure.

Limit Agent Scope

The problem: Teams sometimes give agents unrestricted scope. The Healer may try to fix a failing test and modify more than necessary. The Generator might create tests in the wrong directories or overwrite existing files.

To avoid this, set clear boundaries:

  • Directory restrictions: Restrict where agents can write code. If your tests live in tests/, configure agents to generate files only in that directory. Exclude fixtures, helpers, and configuration files from agent edits.
  • Confirmation for rewrites: Require human review before agents modify existing tests. The Healer can propose fixes, but engineers should review and approve changes before they are applied.
  • Prevent architecture shifts: Configure agents to follow existing patterns, not introduce new ones. For example, if your suite uses the Page Object Model, agents should not generate tests with inline selectors. If you use fixtures, agents should not bypass them.
  • Limit modification scope: When the Healer fixes a failing test, it should change only the failing locator, not refactor surrounding code. Scope each fix to the minimum change required to restore functionality.

Example configuration:

  • Agents write only to tests/generated/.
  • Agents cannot modify files in tests/fixtures/ or tests/pages/.
  • Locator changes require pull request approval.
  • Generated tests must import from ./fixtures, not @playwright/test.

Some teams only notice the need for boundaries after agents make broader changes than intended, such as modifying shared authentication logic while fixing a selector. Requiring agents to suggest changes rather than apply them directly helps prevent this. With the right model and clear limits in place, teams still benefit from faster test generation and safer maintenance.

9. Equip Your Agent with AI Skills

AI agents are only as effective as the knowledge they have access to. Generic AI assistance often falls short when it comes to testing—agents rely on outdated patterns, miss framework-specific nuances, and produce tests that are flaky by design. To get consistent, high-quality output, you need to provide agents with specialized expertise.

This is where Agent Skills come in. Skills are a new open standard created by Anthropic for providing expertise to agents without bloating the context window. They're now available in all major AI development tools, including Claude Code, Cursor, VS Code, and Google Gemini.

Why skills matter for Playwright testing:

  • Framework-specific guidance: Generic AI knows about testing in general, but may not know the latest Playwright APIs, auto-waiting behavior, or recommended locator strategies.
  • Consistent patterns: Skills encode your preferred patterns—Page Object Model, fixture usage, assertion styles—so agents generate code that matches team conventions.
  • Reduced drift: Without explicit guidance, agents often introduce variations or outdated practices. Skills keep generated code aligned with current best practices.
  • Focused context: Instead of loading entire documentation into prompts, skills provide precisely the relevant knowledge when needed.

We released the Playwright Best Practices Skill specifically for this purpose. It gives AI agents specialized guidance for writing, debugging, and maintaining Playwright tests in TypeScript, covering everything from locators and assertions to CI/CD configuration and advanced patterns like multi-user testing and GraphQL mocking.

Install the skill and start generating better Playwright tests:

npx skills add https://github.com/currents-dev/playwright-best-practices-skill

Once installed, the AI automatically uses the skill when your questions or tasks involve Playwright—no manual configuration required. Instead of hoping the agent knows current best practices, you give it explicit, up-to-date knowledge that produces maintainable tests from the start.

Teams that combine proper skills with the strategies in this article see compounding benefits: agents follow both team conventions (from seed files and architecture) and framework best practices (from skills), resulting in tests that require less review and fewer corrections.

What Happens When Teams Use Agents the Right Way

Teams applying these strategies often see improvements. Test creation becomes faster, maintenance effort is reduced, and coverage increases over the same period. Metrics help track changes, but the most noticeable difference shows up in daily work.

Time allocation shifts. Engineers spend less time fixing selectors. QA teams write less repetitive setup code. Senior engineers review fewer test pull requests. Freed time is spent on exploratory testing, risk analysis, architecture planning, and edge-case exploration.

Locator stability improves early. Agents verify selectors against running applications during generation. Flake rates decrease as role-based locators become standard and fragile patterns are avoided.

Consistency grows across teams. Generated tests follow seed files, which keep fixture and page object usage aligned. New engineers see examples that match team conventions, making onboarding faster.

Timing issues surface sooner. The Generator executes flows during test creation, catching race conditions, async rendering issues, or slow API calls before they reach CI. In one case, a checkout flow issue was identified and fixed before release.

Reviewers face less cognitive load. They focus on logic and coverage rather than checking selectors. Some teams report that senior engineers now spend fewer hours reviewing tests, freeing time for higher-level planning.

Agents become part of daily workflows. Within weeks, developers, QA engineers, and the Healer use them regularly. Confidence in agents grows, and teams review generated code with a mindset of verification rather than doubt. Reviews move faster, coverage expands, and teams deliver features more smoothly.

The key takeaway: strong teams use agents to handle routine work, not to replace judgment. Agents reduce repetitive effort, allowing engineers to focus on tasks that need context and experience.

Moving Forward

Playwright Test Agents won't fix broken test architectures or remove the need for solid engineering practices. They amplify whatever foundation you build, for better or worse. Teams seeing results use agents intentionally. They define clear patterns, selector strategies, and review processes that treat agent output like junior engineer code, preventing architectural drift.

Testing at scale requires orchestration beyond agent capabilities. You need test distribution, failure management, and CI optimization. Agents don’t fix broken foundations. They multiply what you already have. If your test suite has a solid structure, agents accelerate everything. If it doesn’t, fix that first.


Scale your Playwright tests with confidence.
Join hundreds of teams using Currents.
Learn More

Trademarks and logos mentioned in this text belong to their respective owners.

Related Posts