Playwright Custom Reporters: Build Your Own
A Playwright custom reporter takes 20 minutes to prototype and months to maintain. Here's what it actually takes to build one that holds up in production CI.

Building a Playwright custom reporter may look simple at first. The Reporter interface has a clean surface area, and the first prototype that logs test names to stdout takes about 20 minutes to write. A class, a few methods, an export default, and you're done. The reporter runs fine on your machine and passes the first CI run. That simplicity breaks down as requirements grow.
Suppose a team builds a working Slack-on-failure reporter one afternoon. A week later, someone asks for failure history, as in, "Has this test failed before?" Then someone wants flakiness trends. The team lead wants a dashboard the whole org can see, and suddenly there's a GitHub Actions workflow step that fires a Lambda that writes to DynamoDB. Six months after that afternoon, a senior engineer is debugging why DynamoDB writes are silently failing on retries, why Slack alerts stopped firing the week of a Playwright minor version bump, and why the dashboard shows 47 tests running when the suite has 312. None of that was on the original whiteboard. What started as a 200-line file is now a maintenance liability with no owner of record.
This article is about avoiding that outcome, or at least making it a deliberate choice. We'll walk through the full Reporter interface and what each hook actually gives you. We'll look at where reporters routinely break under parallelism and sharding. We'll cover async handling so your reporter doesn't quietly drop data or hang CI. And we'll close with a clear-eyed framework for deciding whether to build, extend, or reach for a platform like Currents, which is built to handle the whole problem space.
By the end, you'll have what you need to build a reporter that holds up under real CI load, along with enough context to decide whether building one is the right call.
The Reporter Interface: Complete Map
Before writing any reporter code, you need to understand the full contract. The Playwright Reporter interface has 10 lifecycle methods and one utility method. Each one has specific data availability and timing characteristics that determine whether your reporter produces correct output.
If you've worked with Jest's TestEvents or Mocha's runner emitter, Playwright's design will feel different. The reporter is a typed class with explicit lifecycle hooks rather than an event emitter you subscribe to with string event names.
It produces stricter typing and clearer ordering guarantees, but it also means the constraints in this article (synchronous-by-default for most hooks, strict main-process isolation, silent error swallowing) are baked into the design rather than being accidents. Knowing that helps explain why some patterns that work in other frameworks won't work here.
Lifecycle Overview
The reporter lifecycle tracks test execution. Here is every method, what fires when, and what the data looks like at that moment.
-
onBegin(config, suite)runs exactly once at the start of the run, after all test files have been discovered and resolved.configis the fully resolvedFullConfigobject, meaning that merges fromplaywright.config.ts, CLI overrides, and environment variables have all been applied.suiteis the rootSuitecontaining the full tree of child suites andTestCaseobjects. The suite tree inonBeginreflects the CLI filters already applied, including--grep,--project, and file path arguments. If you're building an "expected test set" for gap detection, you're working with the filtered set, not the full suite. -
onTestBegin(test, result)is called when a test starts. Theresultobject exists at this point but is almost empty. Itsstatusfield is not yet set,durationis zero, andattachmentsis empty. The only safe fields to read here areresult.retry(which attempt this is) andresult.startTime. Do not make decisions based onresult.statusinonTestBegin. -
onStepBegin(test, result, step)signals the start of a test step. TheTestStepobject includestitle,category,parent(a reference to the parent step if nested),location(source file and line number), andstartTime. Thecategoryfield matters more than most reporters acknowledge. It distinguishes between Playwright API calls ("pw:api"), user-defined steps fromtest.step()("test.step"), expect assertions ("expect"), fixture setup and teardown ("fixture"), hooks ("hook"), and attachment calls ("test.attach"). A reporter that displays steps without filtering by category will include Playwright's internal API calls in its output, producing noise that obscures the test narrative. Most reporters that render steps should filter oncategory === "test.step"orcategory === "expect". Starting with v1.50, steps carry attachments, and starting with v1.51, they carry annotations. -
onStepEnd(test, result, step)is the counterpart toonStepBeginand gets called when a step completes. The step now has adurationfield and anerrorfield if it failed. Steps can contain nested sub-steps throughstep.steps, so the tree can go arbitrarily deep. If you're computing step timing for performance analysis, this is where you read it. -
onStdOut(chunk, test, result)receives stdout output produced in a worker process. Thetestandresultarguments may beundefinedif output happens outside a test context, such as in a fixture setup that runs before any test has started. Every reporter that captures console output must guard against an undefinedtestwith something likeif (test) { ... }. -
onStdErr(chunk, test, result)mirrorsonStdOutfor stderr. The sameundefined-test caveat applies. -
onTestEnd(test, result)fires when a test finishes. This is the primary hook for most reporters, andresultis fully populated here. Every field you care about, includingstatus,duration,attachments,errors, andsteps, is available. There are two false-alert sources to watch for here. The first is retries: if--retries 2is configured,onTestEndruns up to three times for the same test, once per attempt. The first two calls may haveresult.status === "failed", and the third may haveresult.status === "passed". A reporter that emits a failure notification insideonTestEndwithout accounting for retries will fire false alerts on tests that ultimately pass. The second is expected failures: a test markedtest.fail()hasexpectedStatus === "failed", andresult.status === "failed"means the test passed according to its own contract. A reporter that comparesresult.statuswithout also checkingtest.expectedStatuswill alert on tests doing exactly what they're supposed to do. The reliable signal in both cases istest.outcome()inonEnd, which collapses retries and respects expected status. The Currents team has an article on flakiness and how to fix it for readers who want to dig further into the underlying problem. -
onError(error)catches errors that occur outside test execution, like unhandled exceptions in worker processes or failures in global setup or teardown scripts. This is not where fixture errors tied to a specific test appear. Those land inTestResult.errorsfor that test. TheonErrorhook is for the class of failures where no test was running when the error occurred. -
onEnd(result)runs once after all tests have completed or testing has been interrupted. TheFullResultobject it receives containsstatus("passed","failed","timedout", or"interrupted"),startTime, andduration. It does not contain aggregate pass/fail counts or total test numbers. If you need those, you must accumulate them yourself acrossonTestEndcalls throughout the run. LikeonExit, this method is async-safe and Playwright awaits any Promise you return from it. There is one non-obvious capability worth knowing:onEndcan return an object with astatusfield to override the run's exit code. This is the documented mechanism for reporters that need to signal failure on conditions beyond test pass/fail, such as when the reporter itself hits an unrecoverable error. -
**
onExit()**is called immediately before the test runner exits, afteronEndhas completed. It is async-safe, meaning Playwright awaits any promise you return. For artifact uploads and external I/O that depend on other reporters' output,onExitis the safer choice. It fires after every reporter has completedonEnd, which means no other reporter's output is still in flight. There is one important exception, covered later in the attachments section: file-system artifacts like traces and screenshots may already be cleaned up byonExittime ifpreserveOutputis set to its default. Upload those eagerly inonTestEndoronEndinstead. -
printsToStdio()is a method that Playwright calls to determine whether your reporter produces terminal output. Returnfalseif your reporter sends data to an external system and doesn't write to stdout. When you returnfalse, Playwright adds a standard terminal reporter (likelistordot) automatically, so the developer running tests still sees progress. Returntrue(or omit the method, which defaults totrue) and Playwright assumes your reporter is handling the terminal. If it isn't, the terminal goes silent during the run, which is confusing for anyone watching the logs.
One nuance worth knowing here: returning false only triggers the automatic terminal reporter injection when no other terminal-printing reporter is already registered. If your config has both ['list'] and your custom reporter, the printsToStdio() value on your reporter doesn't cause Playwright to add a second one. This matters during refactors. If you delete the ['list'] entry from a config without checking your custom reporter's printsToStdio() value, you can ship a CI run with no terminal output and no obvious cause.
The table below summarizes data availability across the major hooks:
| Hook | result.status | result.duration | result.attachments | result.errors |
|---|---|---|---|---|
| onTestBegin | not set | 0 | empty | empty |
| onStepBegin | not set | 0 | partial | partial |
| onStepEnd | not set | 0 | partial | partial |
| onTestEnd | fully set | fully set | fully set | fully set |
| onEnd | run status only | run duration | n/a | n/a |
The Suite and TestCase Object Model
The hooks tell you when things happen. The objects passed into those hooks tell you what those things are. Two of those objects, Suite and TestCase, define the structural side of the API. Most reporters need to reason about test organization (which file, which describe block, which project), and that information lives here rather than in TestResult.
onBegin gives you the root Suite, which is the entry point into a tree structure you need to understand before building anything that reasons about test organization. The root suite has child suites for each project, each of which has child suites for each file, each of which has child suites for each test.describe block, bottoming out in TestCase objects.
The fields you'll reach for most often on TestCase are:
-
test.id: A stable identifier computed from file name, title, and project name. It's unique within a session and stable across runs as long as those three inputs don't change, which makes it the right key for any reporter that persists results to an external store. Renaming a test or moving its file changes the id, so reporters tracking test history need a backup matching strategy (typicallytitlePath()) for the rename case. Adding a new project toplaywright.config.tsalso generates new IDs for every test that runs under it, which is a common surprise for reporters tracking cross-run history. One additional caveat applies in sharded runs: each shard runs its own Playwright process and assigns IDs independently. The same test will get the same id across shards only if file path, title, and project name match, which they should under normal sharding. Anything that perturbs those inputs between shards (different config paths, project name overrides per shard) will produce divergent IDs for the same logical test. The merge-reports section later in this article discusses the related issue of shard projects being preserved as separateTestProjectinstances. -
test.titlePath(): Returns the full path from root to this test as an array of strings, giving you the file, describe block, and test title in one call. -
test.outcome(): Returns"expected","unexpected","flaky", or"skipped"after all retries have completed. This is a much cleaner signal than manually tracking retry counts. Use it inonEndto determine the final verdict on each test. -
test.annotations: The annotations declared at test or suite level. -
test.tags: The tags array. -
test.expectedStatus: What the test is expected to do ("passed"for normal tests,"failed"fortest.fail(),"skipped"fortest.skip()ortest.fixme()). -
test.parent: The enclosingSuite. Useful for any reporter that needs to roll results up by describe block or file. -
test.location: An object withfile,line, andcolumn. Required for any reporter that produces source-linked output, including the GitHub Annotations format used by Playwright's built-ingithubreporter. -
suite.allTests(): Flattens the full suite tree into a flat array of allTestCaseobjects. Useful inonBeginto build your expected test map upfront.
One mechanical but important note: the custom reporter class must use export default. Playwright uses a dynamic import to load the reporter file, and it expects the default export to be the reporter class. Missing export default produces an error that looks like a config issue rather than an export issue, and you'll waste time tracking it down.
Here is a snippet that traverses the suite tree in onBegin to build an expected test map, keyed by test.id, with each entry capturing the file path, full title path, and tags:
import type {
Reporter,
FullConfig,
Suite,
TestCase,
TestResult,
FullResult,
} from "@playwright/test/reporter";
interface TestRecord {
titlePath: string[];
filePath: string;
tags: string[];
outcome?: "expected" | "unexpected" | "flaky" | "skipped";
}
class MyReporter implements Reporter {
private expectedTests = new Map<string, TestRecord>();
onBegin(config: FullConfig, suite: Suite) {
for (const test of suite.allTests()) {
this.expectedTests.set(test.id, {
titlePath: test.titlePath(),
filePath: test.location.file,
tags: test.tags,
});
}
}
onTestEnd(test: TestCase, result: TestResult) {
// Fill in outcome as each test finishes
const record = this.expectedTests.get(test.id);
if (record) {
record.outcome = test.outcome();
}
}
onEnd(result: FullResult) {
// Any test still without an outcome never received onTestEnd,
// which usually means it was in-flight when a worker crashed.
for (const [id, record] of this.expectedTests) {
if (!record.outcome) {
console.error(`Test ${record.titlePath.join(" > ")} never completed`);
}
}
}
}
export default MyReporter;
TestResult: What's Actually In It
If Suite and TestCase describe what the test is, TestResult describes what happened when it ran. This is the object every meaningful reporter spends most of its time with. Whatever you want to know about a test run (did it pass, how long it took, what it logged, what artifacts it produced) is here.
These are the fields you'll actually use:
-
status:"passed","failed","timedOut","skipped", or"interrupted". This is the status of this specific attempt, not the final verdict across all retries. -
duration: Wall-clock milliseconds for this attempt. -
error: Shortcut forerrors[0], the first error thrown during execution. -
errors: An array ofTestErrorobjects, each withmessage,stack, and an optional source codesnippet. -
attachments: Array of attachments withname,contentType, and eitherpath(for file-based attachments) orbody(for inlineBufferattachments). This is where traces, screenshots, videos, and customtestInfo.attach()calls land. -
retry: Which retry attempt this result represents, zero-indexed. On a run withretries: 2, you'll see retry values of 0, 1, and 2. -
startTime: AbsoluteDateof when this attempt started. -
steps: The full step tree mirroring theonStepBegin/onStepEndcalls. -
stdoutandstderr: Arrays of string orBufferchunks written during this test. -
workerIndex: Which worker process ran this test. This number is unique per worker instance for the entire run; when a worker crashes and is replaced, the new worker gets a newworkerIndex. The value is-1if the test never ran (for example, if testing was interrupted before this test started). -
parallelIndex: The worker's slot, an integer between 0 andworkers - 1. UnlikeworkerIndex, this value is reused. When a worker crashes and is replaced, the replacement gets the sameparallelIndexas the worker it replaced. This is the right field for slot-based resource allocation, where you want a stable identifier tied to the slot rather than to the specific process. Test accounts, database shards, and seeded data partitions are all good fits forparallelIndex.workerIndexis the right field for logging and debugging, where you want to distinguish between different process instances even if they shared a slot. -
annotations(added in v1.52): The list of annotations applicable to this test result, including annotations added dynamically during execution viatestInfo.annotations. Useful for reporters that process custom annotation types like test owner or priority.
The retry field is the source of the most common reporter logic bug. With retries: 2 configured, onTestEnd fires up to three times for the same test. A reporter that tracks failures by listening for result.status === "failed" in onTestEnd will count intermediate retry failures as real failures. The correct approach is to use test.outcome() in onEnd. It returns "flaky" if the test passed on a retry and "unexpected" if it failed all attempts. Only "unexpected" deserves a failure alert.
That's the API. Hooks, the suite tree, and TestResult together cover everything Playwright will hand to your reporter. What none of it tells you is how to organize a reporter that stays reliable across a year of CI runs, team turnover, and Playwright version bumps. That's the next section.
Building the Reporter: Architecture Decisions First
With the API contract clear, the next decisions are architectural. These choices determine whether your reporter stays maintainable over time or becomes the thing nobody wants to touch.
Class vs. Object Literal
Playwright accepts either a class or a plain object implementing the Reporter interface. Use a class. An object literal has no constructor, which means you can't validate required configuration at startup, can't initialize instance state cleanly, and can't separate setup logic from hook logic. The only reason to use an object literal is for a one-time debugging reporter you'll delete in a short span of time.
A class gives you a constructor that can fail fast if required options are missing. Finding out on test 1 that an API key is missing beats finding out on test 247, when your reporter has been silently dropping results because every HTTP call returned 401. Here is a scaffold with option validation in the constructor:
import type {
Reporter,
FullConfig,
Suite,
TestCase,
TestResult,
FullResult,
} from "@playwright/test/reporter";
interface ReporterOptions {
webhookUrl: string;
projectName?: string;
}
class DashboardReporter implements Reporter {
private webhookUrl: string;
private projectName: string;
private failures: { title: string; error: string }[] = [];
private reporterErrors: Error[] = [];
constructor(options: ReporterOptions) {
if (!options.webhookUrl) {
throw new Error(
"DashboardReporter: webhookUrl is required. " +
"Pass it via playwright.config.ts reporter options."
);
}
this.webhookUrl = options.webhookUrl;
this.projectName = options.projectName ?? "unknown";
}
printsToStdio() {
return false;
}
// hooks follow
}
export default DashboardReporter;
Registering in playwright.config.ts
Once the class is defined, Playwright needs to know about it. Reporters are registered under the reporter array in playwright.config.ts. Each entry is either a string for built-ins or a [path, options] tuple for custom reporters.
Here is a configuration that runs the custom reporter alongside the built-in list reporter so the terminal remains active:
import { defineConfig } from "@playwright/test";
export default defineConfig({
reporter: [
["list"],
[
"./reporters/dashboard-reporter.ts",
{
webhookUrl: process.env.DASHBOARD_WEBHOOK_URL,
projectName: "checkout-flow",
},
],
],
});
There is a subtle behavior here worth knowing. When you pass --reporter on the CLI, it overrides the config's reporter array entirely. If your CI workflow uses npx playwright test --reporter=dot, your custom reporter does not run, regardless of what's in the config. If your CI config uses an explicit --reporter flag, you need to include the custom reporter in the CLI argument. You can chain multiple reporters on the CLI with comma separation, as in --reporter=dot,./reporters/dashboard-reporter.ts.
Sync vs. Async Hooks
Two reporter methods are documented to return a Promise that Playwright awaits: onEnd and onExit. Async work inside these hooks is safe. You can make HTTP calls, wait for database writes, or flush buffers, and Playwright will hold the process open until they resolve.
Other hooks like onTestEnd, onTestBegin, and the step hooks are a different story. At the time of writing, Playwright does not await Promises returned from them. The Playwright team has explicitly closed feature requests asking for async support on these hooks, stating they are not planning to change them. This matters for reporter design. If you write an async onTestEnd that awaits an HTTP call, the hook function will return a pending Promise that Playwright drops on the floor. The side effects inside the hook still run, but Playwright moves on without waiting. Any error from the async work becomes an unhandled rejection, and you lose the ordering guarantees you probably assumed you had.
The practical implication is to do synchronous work in onTestEnd. Collect the data you need into instance state and defer the real I/O to onEnd or onExit, where the Promise actually gets awaited. This also sidesteps the per-test latency problem. A 100ms HTTP call fired from every onTestEnd adds up fast on a 500-test suite, and batching in onEnd removes that overhead entirely.
The more critical async issue is error handling. Playwright intentionally swallows all errors thrown inside reporter methods. This is a documented design decision. If your onTestEnd throws an unhandled exception, Playwright catches it, discards it, and moves on. The test run finishes with exit code 0. Your reporter can be broken on every hook call without producing any visible signal, and you will never know unless you add your own instrumentation.
So try/catch inside every async hook isn't optional. Catching errors is also not enough on its own. You need a place to put them. The pattern that works is collecting reporter errors in an instance variable and checking that collection in onEnd, where you can return { status: 'failed' } to fail the CI job if the reporter hit an unrecoverable error:
class DashboardReporter implements Reporter {
private reporterErrors: Error[] = [];
private failures: { testId: string; title: string; error: string }[] = [];
onTestEnd(test: TestCase, result: TestResult) {
try {
if (result.status === "failed" || result.status === "timedOut") {
this.failures.push({
testId: test.id,
title: test.titlePath().join(" > "),
error: result.error?.message ?? "unknown error",
});
}
} catch (err) {
// Playwright swallows this, so we collect it ourselves
this.reporterErrors.push(err as Error);
process.stderr.write(`[DashboardReporter] onTestEnd error: ${err}\n`);
}
}
async onEnd(result: FullResult) {
try {
await this.flushToWebhook(result);
} catch (err) {
this.reporterErrors.push(err as Error);
process.stderr.write(`[DashboardReporter] onEnd flush error: ${err}\n`);
}
if (this.reporterErrors.length > 0) {
// Override exit code to signal reporter failure in CI
return { status: "failed" as const };
}
}
private async flushToWebhook(result: FullResult) {
// HTTP call with timeout, covered in the integrations section
}
}
Writing to process.stderr in the catch block matters. It's the fallback signal that confirms something went wrong even when the reporter can't affect the exit code, and it surfaces in CI logs without requiring any special infrastructure.
Extending and Composing Built-in Reporters
Before building from scratch, ask whether composition solves the problem without any custom code. The most common scenario is "I want the standard HTML report plus a Slack notification on failure." That's two reporters in the config array, each doing its own job, with no custom code needed for the HTML side. Registering multiple reporters is fully supported, and each one sees the same event stream independently.
When you do need to extend or delegate to a built-in, you can import types from @playwright/test/reporter and wrap an inner reporter instance. This is the right approach for "everything the JSON reporter does, plus my custom metadata fields."
The main thing to be aware of is that delegation requires calling the inner reporter's hooks in the correct order and handling any errors it throws. If your wrapper calls innerReporter.onTestEnd(test, result) and the inner reporter throws, that exception lands on your reporter's call stack, so Playwright swallows it along with your own errors. That can mask bugs in the inner reporter's output.
Environment and Build Metadata
Whether you build from scratch or extend a built-in, most reporters need to attach context that Playwright itself doesn't know about. Custom reporters are often built to add context that Playwright doesn't capture natively, such as build ID, commit SHA, PR number, branch name, or test owner. Read these in the constructor from process.env, not inside hooks. Reading the same environment variable in every onTestEnd call adds unnecessary overhead and scatters the initialization logic.
For metadata tied to the Playwright run rather than the CI environment, FullConfig.metadata is a cleaner option. This config-level field accepts arbitrary key-value pairs and is accessible in onBegin. You can set it in playwright.config.ts and any reporter can read it:
// playwright.config.ts
export default defineConfig({
metadata: {
buildId: process.env.BUILD_ID,
branch: process.env.GIT_BRANCH,
prNumber: process.env.PR_NUMBER,
},
reporter: [["./reporters/dashboard-reporter.ts"]],
});
// dashboard-reporter.ts
onBegin(config: FullConfig, suite: Suite) {
this.buildId = config.metadata.buildId as string;
this.branch = config.metadata.branch as string;
}
globalSetup Timing and Lazy Initialization
Reporters are active during globalSetup and globalTeardown, and the reporter constructor runs before globalSetup executes. If your reporter depends on state that globalSetup establishes (a database connection string written to an environment variable, a temp directory path, or a service URL resolved at startup), you cannot initialize that dependency in the constructor. The dependency doesn't exist yet.
The solution is lazy initialization in onBegin, which fires after globalSetup completes. Check for the dependency there and initialize it before any tests run:
class DatabaseReporter implements Reporter {
private dbConnection: DatabaseClient | null = null;
private connectPromise: Promise<DatabaseClient> | null = null;
onBegin(config: FullConfig, suite: Suite) {
// process.env.DB_URL was set by globalSetup
const dbUrl = process.env.DB_URL;
if (!dbUrl) {
throw new Error("DatabaseReporter: DB_URL not set. Did globalSetup run?");
}
// Start connecting but don't block onBegin. Playwright may not await it.
// Await this Promise inside hooks that are documented as awaited (onEnd, onExit).
this.connectPromise = DatabaseClient.connect(dbUrl);
}
async onEnd(result: FullResult) {
if (this.connectPromise) {
this.dbConnection = await this.connectPromise;
}
// ... use this.dbConnection to flush results
}
}
Only onEnd and onExit are documented to await returned Promises, so any async initialization you kick off in onBegin must be resolved inside one of those hooks before you rely on it. Kicking off the work in onBegin and awaiting the handle later keeps startup parallelized with test discovery without assuming guarantees Playwright doesn't currently provide.
This pattern works for batch-flush reporters that only touch the connection in onEnd. If your reporter streams results during the run (writing to the connection from onTestEnd), there's a real problem: onTestEnd is not awaited, so you can't safely await connectPromise inside it. The two viable options are blocking onBegin synchronously on the connection (which slows test startup but is safe) or buffering writes in onTestEnd and flushing them on a fast timer until the connection is live. There is no clean third option that avoids either blocking startup or accepting delivery uncertainty.
One edge case worth testing is that errors during globalSetup trigger onError in the reporter before onBegin ever fires. If your onError handler depends on state that onBegin was supposed to initialize, it will either throw or silently do nothing. Test the globalSetup failure path explicitly. It's the kind of edge case that only surfaces in production when a deploy script breaks.
State Accumulation Across Hooks
Once the reporter is initialized, the next architectural concern is what it remembers between hook calls. Most non-trivial reporters accumulate state across hooks. They collect failures for a summary, build a structured result object, or track durations for percentile calculations. This state lives on the reporter instance, initialized in the constructor.
One practice worth adopting from the start: in onTestEnd, serialize only the fields you need from TestResult into a plain object before storing them. Don't store references to TestResult objects directly. TestResult is a rich object with nested references, and holding a thousand of them for a 1,000-test suite creates memory pressure in resource-constrained CI containers. Extracting only what you need keeps the memory footprint predictable:
interface StoredResult {
testId: string;
title: string;
status: string;
duration: number;
retry: number;
errorMessage: string | undefined;
tags: string[];
}
onTestEnd(test: TestCase, result: TestResult) {
const stored: StoredResult = {
testId: test.id,
title: test.titlePath().join(" > "),
status: result.status,
duration: result.duration,
retry: result.retry,
errorMessage: result.error?.message,
tags: test.tags,
};
this.results.push(stored);
}
Parallelism: The Most Common Source of Reporter Bugs
The previous sections cover the API and the architecture. This section covers the runtime environment where most production reporters eventually break.
Playwright runs tests across multiple worker processes, and your reporter runs in the main process. The hooks you implement do not run inside workers. They run in the main process, receiving serialized events that workers send back as tests progress. That separation has consequences for ordering, for what state your reporter can hold, and for how you reason about partial results when something goes wrong on a worker.
The first and most common consequence is that events do not arrive in the order your test files declare them. Tests run in parallel across workers, and the reporter sees them in completion order. The next subsection covers what that means in practice. After that, we'll look at what happens when workers crash mid-run, how sharding splits a single test plan across multiple Playwright processes, and how the blob reporter and merge-reports change the picture entirely.
One related question worth answering up front: what if your tests need to pass data to the reporter beyond what the standard hooks provide? Workers cannot reach the reporter directly. Events flow from worker to main process through a serialization layer, and user data has to ride one of four channels: stdout (string-only, readable via onStdOut), attachments (the only option for binary or large data, via testInfo.attach()), annotations (string values, accessible via result.annotations after v1.52), or test steps (visible in the step tree).
The right choice depends on the data shape. Use attachments for any binary or large payload. Use annotations for short structured metadata that should appear alongside the test in the report. Use stdout for log-shaped output that benefits from streaming. Use steps when the data should appear in the test narrative itself. Stuffing structured data into stdout and parsing it on the reporter side works but is the most fragile option, since any other library writing to stdout in the same test produces noise the reporter has to filter out.
Event Ordering is Not Guaranteed
Because tests run on different workers and finish at different times, onTestEnd events from parallel workers arrive in completion order, not in the declaration order from your test files. A reporter that assumes tests within a file complete sequentially will produce incorrect output under fullyParallel: true or any run with more than one worker. Design reporter state to be order-independent.
The practical implication is to accumulate into a Map keyed by test.id, not into an ordered array. Arrays work when tests complete in source order. Maps always work:
// Fragile: assumes completion order matches declaration order
private results: StoredResult[] = [];
// Correct: order-independent, keyed by stable test identifier
private results = new Map<string, StoredResult[]>();
onTestEnd(test: TestCase, result: TestResult) {
const existing = this.results.get(test.id) ?? [];
existing.push({
status: result.status,
retry: result.retry,
duration: result.duration,
errorMessage: result.error?.message,
});
this.results.set(test.id, existing);
}
Storing an array per test ID handles retries naturally. Each attempt appends to the same entry, and in onEnd you can call test.outcome() to get the final verdict without inspecting the individual attempts.
Worker Crashes and Missing Results
Ordering is one consequence of running tests across workers. Worker reliability is another. The suite tree in onBegin tells you every test that should run. Worker crashes mean some of those tests will never call onTestEnd. The main process doesn't receive a failure signal for tests that were in-flight when a worker died. They just never report back.
A reporter that builds its expected test set from onBegin and compares it against received results in onEnd can detect these gaps. The snippet earlier in the Suite and TestCase section shows the pattern. Any test.id with no recorded outcome after onEnd is a test that never completed. Whether you treat this as a warning, an alert, or a hard failure depends on your team's conventions, but detecting it is only possible if you model the expectation upfront.
Sharding and the Reporter's Perspective
Workers split a single Playwright run across processes on one machine. Sharding splits it across machines, and the reporter's view of the run changes accordingly. The two mechanisms solve different problems and have different operational tradeoffs, which the Currents team has written about in detail for teams choosing between them. Under --shard, each shard runs a separate Playwright process with its own reporter instance. If your reporter sends results to an external service, each shard sends its own partial results independently. There is no cross-shard aggregation at the reporter level.
Whether this breaks your reporter depends on what the reporter does. A reporter that writes a results.json file will produce n separate files across n shards, each containing one shard's results. A reporter that posts to a webhook will post n times. A reporter that counts total pass/fail will report per-shard counts, instead of totals.
The key to handling this correctly is detecting the shard context in onBegin. config.shard is either null (no sharding) or { total: number, current: number }. Two things to know here. First, current is one-based, so shard 1 of 4 is { total: 4, current: 1 }. That's easy to mix up with zero-indexed worker indices. Second, for reporters that generate file output, including the shard number in the file name prevents shards from overwriting each other:
onBegin(config: FullConfig, suite: Suite) {
const shard = config.shard;
if (shard) {
// shard.current is 1-based
this.outputFile = `results-shard-${shard.current}-of-${shard.total}.json`;
} else {
this.outputFile = "results.json";
}
}
For reporters that post to an external service, the right design in a sharded run is either to include the shard context in every payload (letting the service aggregate) or to post partial results per shard and let the service handle assembly.
Blob Reporter and merge-reports: Playwright's Built-in Cross-Shard Solution
Since v1.37, Playwright provides a built-in solution to cross-shard aggregation: the blob reporter combined with the merge-reports CLI. This is worth knowing because it changes the architecture for a large class of use cases.
Here's how it works. You configure reporter: 'blob' in the CI run. Each shard produces a zip file in the blob-report directory containing a complete serialized record of that shard's results. After all shards are complete, a separate CI job runs:
npx playwright merge-reports --reporter html ./all-blob-reports
The merge step calls the same Reporter API that your custom reporter implements. If your reporter file is passed to --reporter in the merge command, it receives a unified stream of events from all shards as if they had run on one machine. No cross-shard aggregation code needed.
There is one subtlety specific to merge-reports that affects custom reporters. When projects are merged from different shards, each shard's copy of the same project becomes a separate TestProject object in onBegin. If Desktop Chrome was sharded across five machines, onBegin receives five project instances with the same name. A reporter that assumes one project name equals one project instance will produce five duplicate entries. Guard against this by deduplicating on project name when iterating projects.
The blob plus merge-reports combination has real limits worth discussing. The output is one combined report per run. There is no history across runs, no flakiness trends across days or weeks, no real-time visibility while shards are still executing, and no shared team access to past results. If your needs stop at "one merged report after the run finishes," this is the right tool. In practice, needs keep growing from there.
Handling Attachments, Traces, and Rich Data
Parallelism shapes what events your reporter sees and when. Attachments shape what the reporter can do with those events. Most of the debugging value in a Playwright run lives in the attachments: traces, screenshots, videos, and anything your tests add via testInfo.attach(). A custom reporter that ignores attachments is throwing away the richest data Playwright produces. A custom reporter that mishandles them creates a new class of bugs involving missing files and failed uploads.
Accessing Attachments in onTestEnd
TestResult.attachments contains everything attached to a test, both what your tests add via testInfo.attach() and what Playwright generates automatically.
You'll typically see three kinds of built-in attachments:
-
Traces, with
name: 'trace'andcontentType: 'application/zip' -
Screenshots, with
name: 'screenshot'andcontentType: 'image/png' -
Videos, with
name: 'video'andcontentType: 'video/webm'
Each attachment has either a path to a file on disk or a body containing a Buffer.
For a reporter that uploads traces to an external store:
onTestEnd(test: TestCase, result: TestResult) {
const trace = result.attachments.find(
(a) => a.name === "trace" && a.contentType === "application/zip"
);
if (trace?.path) {
// Queue for upload. Don't await here unless you need blocking.
this.traceUploadQueue.push({
testId: test.id,
tracePath: trace.path,
});
}
}
The Attachment Path Lifecycle
Reading attachments is straightforward. Holding onto them long enough to actually use them is where reporters get into trouble. This is the detail that breaks trace-uploading reporters in production. Attachment paths point to files in Playwright's output directory, which defaults to test-results/. Whether those files survive to the end of the run depends on the preserveOutput config option, not outputDir. The default value of preserveOutput is "always", which keeps output for all tests. But many CI setups override this to "failures-only" to save disk space, which means output directories for passing tests are cleaned up at the end of the run.
If your config (or your CI template) sets preserveOutput: 'failures-only', any reporter that wants to upload traces from passed tests has to perform the upload eagerly in onTestEnd (queuing the path while the file still exists) and flush in onEnd. Deferring to after onExit is too late. The safe rule is to process and upload attachments eagerly in onTestEnd or in onEnd, which completes before Playwright's output cleanup pass runs. This holds regardless of the preserveOutput setting, since it protects you against config changes you didn't anticipate.
If your use case requires retaining all traces regardless of outcome, confirm preserveOutput is set to its default 'always' (or set it explicitly) in the project config. Changing it to 'failures-only' or 'never' saves CI disk space but breaks deferred uploads.
Building a Trace-Aware Reporter
A reporter that uploads traces on failure needs to handle three things: detecting the right failure statuses, locating the trace attachment, and handling the case where no trace exists. No trace may exist because trace collection is disabled, or because the test passed under retain-on-failure mode. Always guard the trace lookup:
onTestEnd(test: TestCase, result: TestResult) {
const isFailed =
result.status === "failed" || result.status === "timedOut";
if (!isFailed) return;
const trace = result.attachments.find(
(a) => a.name === "trace" && a.contentType === "application/zip"
);
if (!trace?.path) {
// Trace collection may be disabled or set to retain-on-failure only
process.stderr.write(
`[TraceReporter] No trace found for failed test: ${test.title}\n`
);
return;
}
this.failedTraces.push({ testId: test.id, path: trace.path });
}
Testing Your Reporter
By this point you have enough patterns to build a reporter that handles hooks, parallelism, and attachments correctly. The next question is how you know it will still work tomorrow.
A custom reporter is production infrastructure. It runs on every test execution in your organization, and because Playwright swallows reporter errors, it can fail completely without producing any visible signal. It needs its own test suite, CI pipeline, and maintenance budget.
Unit Testing Hook Logic
The most testable pattern is to extract result-processing logic from hook methods into pure functions that take a TestResult-shaped object and return processed output. These functions have no side effects and no Playwright dependencies, so they are trivially testable with any test runner.
// Pure function. No reporter state, no Playwright imports needed in tests.
export function serializeResult(
testId: string,
titlePath: string[],
result: { status: string; duration: number; error?: { message: string } }
): StoredResult {
return {
testId,
title: titlePath.join(" > "),
status: result.status,
duration: result.duration,
errorMessage: result.error?.message,
};
}
// In reporter hook. Delegates to the pure function.
onTestEnd(test: TestCase, result: TestResult) {
this.results.set(
test.id,
serializeResult(test.id, test.titlePath(), result)
);
}
Integration Testing with a Fixture Suite
Unit tests cover result processing logic. They cannot cover the reporter's full lifecycle, including how it behaves under parallel workers, retries, and attachment handling. For that, you need a real Playwright run against a dedicated fixture suite maintained in the reporter's repository.
The fixture suite should be small, 10 to 20 tests, but deliberately designed to exercise every code path the reporter cares about. Include tests that pass, tests that fail, tests that pass on retry (to generate "flaky" outcomes), tests with custom testInfo.attach() calls, and tests that time out. Run it in CI as the reporter's integration test, separate from your application's test suite.
There's one critical thing to validate beyond "did the reporter throw an exception." You also need to confirm it produced the expected output. Because Playwright swallows reporter errors, a reporter that throws on every hook call will still pass the test run from Playwright's perspective. Your integration tests need to check the reporter's actual artifact, meaning the file was written, the HTTP calls were made, and the database rows exist.
Testing Under Sharding
The fixture suite covers single-process behavior. Sharded runs introduce a second category of tests worth running.
If your reporter is designed for sharded runs, test it explicitly with --shard=1/2 and --shard=2/2 against the fixture suite. Verify that partial result handling is correct and that the reporter doesn't crash or produce invalid output when it sees only a subset of the tests defined in the suite. If you plan to use the reporter with merge-reports, test that path too. Generate blob reports from sharded runs, merge them, and confirm the reporter produces correct unified output.
Testing for Silent Failures
As Playwright swallows reporter errors, you need a specific test strategy for failure cases. Intentionally break your reporter by throwing inside onTestEnd or returning a rejected promise from onEnd. Then verify that your error collection mechanism captured the error and that the run exit code reflects the reporter failure. If you skip this test, you might discover your reporter has been silently failing in production only when someone notices the Slack alerts have been quiet for weeks.
Integrating Custom Reporters with External Systems
Testing confirms your reporter works. Integration decides what your reporter is for. Every custom reporter eventually sends data somewhere, and the shape of that somewhere (an HTTP endpoint, a file on disk, a Slack channel) changes the architecture. The three patterns below cover the vast majority of real-world reporters, and each has pitfalls that only show up once you're running against a real CI pipeline.
HTTP-Based Result Emission
The majority of custom reporter use cases involve sending data somewhere: a webhook, a dashboard API, or a notification service. The architectural decisions here affect how much overhead the reporter adds to your CI pipeline.
Per-test HTTP calls in onTestEnd are the most intuitive pattern and the most problematic at scale. Because Playwright does not await the Promise returned from onTestEnd, an async onTestEnd that fires an HTTP call gives you an unhandled Promise per test. The request still goes out, but errors are silently dropped, you have no delivery guarantees, and a slow downstream service can leave hundreds of requests in flight by the time the run ends. Out-of-order delivery is also possible, since Playwright moves on without waiting. The default recommendation is to accumulate results synchronously in onTestEnd and flush them in a single call from onEnd, where Playwright does await the Promise.
When you need live streaming, the cost is manageable with the right transport. A persistent WebSocket or an HTTP connection with keep-alive avoids the per-request TCP handshake overhead. Use a persistent HTTP agent for connection pooling:
import { Agent, setGlobalDispatcher } from "undici";
import type {
Reporter,
FullConfig,
FullResult,
TestCase,
TestResult,
} from "@playwright/test/reporter";
// Set once at module load so all fetch calls reuse connections
setGlobalDispatcher(
new Agent({
keepAliveTimeout: 10_000,
keepAliveMaxTimeout: 60_000,
connections: 4,
})
);
interface StoredResult {
testId: string;
title: string;
status: string;
duration: number;
retry: number;
errorMessage: string | undefined;
tags: string[];
}
class DashboardReporter implements Reporter {
private webhookUrl: string;
private buildId: string = "";
private branch: string = "";
private results: StoredResult[] = [];
private reporterErrors: Error[] = [];
constructor(options: { webhookUrl: string }) {
if (!options.webhookUrl) {
throw new Error("DashboardReporter: webhookUrl is required");
}
this.webhookUrl = options.webhookUrl;
}
printsToStdio() {
return false;
}
onBegin(config: FullConfig) {
this.buildId = (config.metadata?.buildId as string) ?? "unknown";
this.branch = (config.metadata?.branch as string) ?? "unknown";
}
onTestEnd(test: TestCase, result: TestResult) {
try {
this.results.push({
testId: test.id,
title: test.titlePath().join(" > "),
status: result.status,
duration: result.duration,
retry: result.retry,
errorMessage: result.error?.message,
tags: test.tags,
});
} catch (err) {
this.reporterErrors.push(err as Error);
}
}
async onEnd(result: FullResult) {
try {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 8_000);
const response = await fetch(this.webhookUrl, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
buildId: this.buildId,
branch: this.branch,
status: result.status,
results: this.results,
}),
signal: controller.signal,
});
clearTimeout(timeout);
if (!response.ok) {
throw new Error(`Webhook returned ${response.status}`);
}
} catch (err) {
this.reporterErrors.push(err as Error);
process.stderr.write(
`[DashboardReporter] Webhook call failed: ${err}\n`
);
}
if (this.reporterErrors.length > 0) {
return { status: "failed" as const };
}
}
}
export default DashboardReporter;
External HTTP calls can fail. A reporter that silently drops results on a 500 response undermines its own purpose. At minimum, implement one retry with exponential backoff for transient failures. Set explicit timeouts on all external calls. onEnd has no hard timeout, so an unbounded HTTP call will hang the CI process.
Writing to File
File-based reporters share one common pitfall, which is the choice between writing incrementally in onTestEnd and writing once in onEnd. Incremental writes with file appends produce partially valid files if the run is interrupted mid-way. Writing the full output in onEnd is safer but requires holding all results in memory.
For suites beyond a few hundred tests, newline-delimited JSON (NDJSON) is worth the small additional parsing overhead. Each line is an independent JSON object, appended in onTestEnd. The file is always parseable even if the run is interrupted. You don't hold the full result set in memory. Standard tools like jq can process NDJSON line-by-line. Writing a single monolithic JSON array in onEnd becomes problematic for suites over roughly five thousand tests, where the accumulated object creates memory pressure in resource-constrained CI containers.
The Failure Notification Pattern
The single most common custom reporter use case is sending Slack or PagerDuty alerts on failure. This sounds like the simplest of all possible reporters, which is exactly why so many teams ship one in an afternoon and then spend months fixing its edge cases.
The first instinct is to filter for failures in onTestEnd and fire a notification for each one. Don't do that. A suite with 20 failures produces 20 Slack messages, which is noise rather than signal. The right shape is to collect failures during the run and send a single summary in onEnd with all of them grouped together, along with a link to the CI job.
Retry handling is the second place this pattern goes wrong. A test that fails on retry 0 and passes on retry 2 is a flaky test that ultimately passed, and alerting on it trains your team to ignore Slack. The fix is to resolve each test's final verdict using test.outcome() in onEnd rather than reacting to individual result.status values during the run. Only "unexpected" outcomes warrant an alert. A "flaky" outcome means the test recovered on retry and does not warrant an alert.
The third place, and the one that bites hardest, is the silent-failure problem inside the notification code itself. If the Slack webhook is down and your notification call throws, Playwright swallows the error and the run exits cleanly. Your team believes everything is fine while real failures go unreported. Wrap the notification call in try/catch, log the failure to stderr as a fallback that shows up in CI logs, and let the reporter's error-collection pattern override the exit code so the CI job fails loudly.
Putting all of that together:
class SlackReporter implements Reporter {
private failures = new Map<string, { title: string; error: string }>();
private testCases = new Map<string, TestCase>();
private reporterErrors: Error[] = [];
printsToStdio() {
return false;
}
onTestEnd(test: TestCase, result: TestResult) {
// Store the TestCase reference. We'll call test.outcome() in onEnd,
// after all retries have completed and the verdict is final.
this.testCases.set(test.id, test);
if (result.status === "failed" || result.status === "timedOut") {
this.failures.set(test.id, {
title: test.titlePath().join(" > "),
error: result.error?.message?.slice(0, 200) ?? "no error message",
});
}
}
async onEnd(result: FullResult) {
// Filter to only tests that failed all attempts
const realFailures = [...this.failures.entries()].filter(([id]) => {
const testCase = this.testCases.get(id);
return testCase?.outcome() === "unexpected";
});
if (realFailures.length === 0) return;
const message = this.buildSlackMessage(realFailures, result);
try {
const controller = new AbortController();
setTimeout(() => controller.abort(), 5000);
const response = await fetch(process.env.SLACK_WEBHOOK_URL!, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(message),
signal: controller.signal,
});
if (!response.ok) {
throw new Error(`Slack webhook returned ${response.status}`);
}
} catch (err) {
this.reporterErrors.push(err as Error);
// Write to stderr as fallback. Visible in CI logs even if Slack is down.
process.stderr.write(
`[SlackReporter] Failed to send Slack notification: ${err}\n` +
`Failures were: ${realFailures.map(([, f]) => f.title).join(", ")}\n`
);
}
if (this.reporterErrors.length > 0) {
return { status: "failed" as const };
}
}
private buildSlackMessage(
failures: [string, { title: string; error: string }][],
result: FullResult
) {
return {
text: `Test suite failed: ${failures.length} test(s) failed`,
blocks: [
{
type: "section",
text: {
type: "mrkdwn",
text: `*${failures.length} test failure(s)* in \`${process.env.GIT_BRANCH ?? "unknown"}\``,
},
},
...failures.slice(0, 10).map(([, f]) => ({
type: "section",
text: {
type: "mrkdwn",
text: `• *${f.title}*\n\`${f.error}\``,
},
})),
],
};
}
}
export default SlackReporter;
The Ongoing Cost: Maintaining a Custom Reporter
Building the reporter is a one-time cost. Everything after that is a recurring tax. Most people don't budget for it, which is why so many custom reporters quietly decay into half-working infrastructure that nobody wants to touch.
Playwright Version Compatibility
Playwright ships new versions frequently. The Reporter interface itself is stable, but the objects it passes gain new fields, change field semantics, and occasionally deprecate patterns. TestResult.annotations was added in v1.52, TestResult.parallelIndex in v1.30, TestCase.id in v1.25. A reporter built and tested against one version will accumulate subtle drift as Playwright evolves.
Each Playwright upgrade in your project should include a reporter compatibility check. This is only practical if you maintain the dedicated integration test suite described earlier. If you skip integration tests, you'll discover compatibility breakage in CI, not in code review.
Feature Creep from Stakeholders
This is the most expensive problem with custom reporters, and it doesn't appear on any technical risk register. The reporter starts life as a narrow integration, maybe a Slack alert on failure or a JSON dump for the team dashboard. Then the requests start arriving. Someone in QA wants Jira tickets auto-created on failure. A product manager wants to see flakiness over the last sprint. A director wants a view of test health across every repo in the org. No single request sounds unreasonable on its own. The engineer who built the reporter is the natural person to handle each one, so the scope grows request by request until what started as a 200-line file is now an unofficial test analytics platform.
The engineer maintaining the reporter is now building infrastructure instead of writing tests. The scope escalation is predictable because it's driven by legitimate need. Teams with Playwright at scale genuinely need failure history, flakiness trends, org-wide visibility, and integrations with their project management tooling. The question is whether a custom reporter is the right vehicle for delivering all of that.
A short test for whether you've crossed the line from reporter to test analytics platform: if any of the following are true, you've built a product, not a reporter.
-
The reporter requires its own database, message queue, or persistent storage layer beyond a flat file.
-
The reporter has views or queries that span more than one test run.
-
The reporter has user-facing UI of any kind, even a single HTML dashboard.
-
The reporter has authentication, authorization, or per-user state.
-
More than one engineer has been asked to maintain it.
None of these are wrong on their own. They are signals that the work has scaled past what the Reporter API is for. At that point the question shifts: do you want to staff a small platform team, or use one of the existing platforms that already does this work?
Monitoring the Reporter Itself
Even if you successfully push back on scope, the reporter that exists today still needs to keep working. Because Playwright swallows reporter errors, your reporter can fail silently for extended periods. The Slack alerts stop, and nobody notices because silence gets read as "no failures." You need monitoring for the reporter itself, including a heartbeat that confirms it ran, an alert when expected output doesn't appear, and periodic validation that external integrations are still reachable. This sits inside the broader question of test-suite health, where the reporter is one signal among several worth tracking.
This is operational overhead on top of a tool that was supposed to reduce operational overhead. It's real, it compounds with every new feature the reporter grows, and it usually lands on the same engineer who built the reporter.
Bus Factor
Monitoring alone won’t help you fix reporter breaks. A custom reporter is typically built by one engineer who understood the Reporter API, the team's CI setup, and the integration target. When that engineer moves teams or leaves the company, the reporter becomes a black box. It keeps running until it doesn't, and the next person to investigate it inherits all the complexity from this article with none of the context.
Most custom reporters have no documentation, no architecture decision record, and a codebase that mixes hook logic, HTTP calls, and CI-specific workarounds in a single file. The bus factor on this kind of infrastructure is almost always one.
When to Build vs. When to Integrate
By this point in the article, you have absorbed what actually goes into a production-grade custom reporter. Ten lifecycle hooks with specific ordering guarantees and data availability windows. Silent error swallowing that requires explicit workarounds. Non-deterministic event ordering under parallelism. Sharding that breaks reporters assuming full-suite visibility. Attachment paths that expire based on preserveOutput config. A testing strategy that requires a dedicated fixture suite. Ongoing maintenance across Playwright version bumps, stakeholder feature creep, and team turnover.
That's the real cost. The question is whether that cost is justified for your specific need.
| Requirement | Build custom reporter | Use blob + merge-reports | Use a platform like Currents |
|---|---|---|---|
| Slack alert on failure | Low complexity, build it | Not relevant | Overkill |
| Internal dashboard integration | Build if API is stable | Not relevant | Depends on dashboard |
| Cross-shard result aggregation | Complex, build carefully | Built-in, free, works today | Native support |
| Trace storage and access | Very complex: storage, CDN, access control | Generates merged HTML with traces | Native support |
| Flakiness trend analytics | Very complex: historical data required | No history, single-run only | Native support |
| Real-time run progress visibility | Moderate complexity | Not supported (post-run only) | Native support |
| Multi-repo, org-wide reporting | High complexity | Requires custom tooling | Strong case for a platform |
| Ongoing maintenance | You own it: version compat, monitoring, bus factor | Minimal (Playwright-maintained) | Vendor-maintained |
For the first two rows in that table, a custom reporter is the right answer. The Reporter API is well-designed for targeted integrations. The effort is justified and the scope is bounded.
For cross-shard aggregation of a single run, use Playwright's blob reporter and merge-reports before building anything custom. It's free, Playwright-maintained, and it calls the same Reporter API, so your custom reporter works with it without modification.
For anything involving historical data, flakiness trends, real-time visibility, or org-wide access, you're building a product, not a reporter. Platforms like Currents exist because enough teams walked down the custom reporter path far enough to meet the full cost. Flakiness detection requires storing and querying test history across hundreds of runs. Trend analytics requires time-series data. Real-time visibility requires persistent server infrastructure and streaming protocols. Org-wide access requires authentication, authorization, and data retention policies. Each of those is a legitimate engineering problem, and none of them belong in a reporter class.
Closing
The Reporter interface is stable and well-designed. The challenges sit in the runtime environment: non-deterministic event ordering from parallelism, partial views from sharding, silent error swallowing that requires explicit instrumentation, and attachment paths that expire before deferred uploads can run.
A reporter that addresses these from the start is reliable. Discovering them after the reporter has been silently dropping failure notifications in CI for three months is an avoidable outcome, and hopefully, this article makes it one you avoid.
The other thing to take away is the scope boundary. The reporter you build today is not the reporter you'll be maintaining six months from now if stakeholder requests compound unchecked. The right response to the first scope expansion isn't automatically "build it." Before each new feature request, ask whether you're still building a reporter or whether you've crossed into building a test analytics product. A 200-line Slack notification reporter is worth owning. A historical trend engine with multi-repo support, flakiness quarantine, and a team-facing dashboard is a different category of investment. Draw the line deliberately, before the line draws itself.
Join hundreds of teams using Currents.
Trademarks and logos mentioned in this text belong to their respective owners.



