LAP Benchmark v2: Measuring
API Documentation Compression
Efficacy for AI Coding Agents

How much can you compress API docs before AI agents lose effectiveness?
LAP Benchmark v2 Harness | Model: claude-sonnet-4-5-20250929 | Full Run: 500/500 runs, 50 specs, 5 formats, 5 tiers
Contents

0 Abstract

This report presents the full results of LAP Benchmark v2, a controlled evaluation measuring how API documentation compression affects AI coding agent performance. We tested five documentation tiers across 50 real-world production APIs spanning 5 specification formats (OpenAPI, AsyncAPI, GraphQL, Postman, and Protobuf), completing 500 runs (500 successfully) using Claude Sonnet 4.5.

Primary finding: Providing any form of API documentation dramatically improves agent performance over no documentation (mean documented-tier score 0.84 vs none-tier 0.40, paired t-test p << 0.001). This effect is consistent across all 5 formats.

Compression finding: No statistically significant quality difference was detected between documented tiers (pretty, minified, LAP-Standard, LAP-Lean) at the current sample size (n=100 per tier, paired t-tests, all p > 0.05). LAP-Lean achieved the numerically highest score (0.851) compared to the pretty baseline (0.825), but this difference is directional only. However, LAP-format tiers achieve highly significant reductions in inference cost (35%, p < 0.001), wall time (-29s/run, p < 0.001), and total token consumption (p < 0.001) compared to pretty-printed originals. LAP-Lean achieves 88% documentation token reduction while maintaining task performance parity.

Practical recommendation: For production AI coding workflows, LAP-Lean offers the best efficiency: equivalent quality at significantly lower cost and latency. Future work with repeated trials and additional models is needed to determine whether the directional quality advantage of LAP formats is a real effect or sampling noise.

1 Key Findings

Total Runs
500
500 completed, 0 timed out
Completion Rate
100.0%
500/500 runs succeeded
Specs Tested
50
5 formats x 10 specs each
Best Performing Tier
LAP-Lean
Avg score: 0.851
Documentation Lift
+0.448
Best tier vs no-doc baseline score delta
LAP-Lean Cost Savings
35%
vs pretty tier baseline
Best Efficiency Tier
LAP-Lean
Highest score per 1K doc tokens
Total Benchmark Cost
$130.68
Avg $0.2614 per run
Core result: Any documentation dramatically outperforms no documentation (p << 0.001). Among documented tiers, LAP-Lean achieves the numerically highest score (0.851) while using ~90% fewer documentation tokens than pretty-printed originals. Differences between documented tiers are directionally consistent but do not reach statistical significance at the current sample size (n=100 per tier). However, LAP-Lean's cost savings, wall time reduction, and token efficiency gains are all statistically significant (p < 0.001). This result holds consistently across all 5 API specification formats tested.

2 Methodology

5-Tier Documentation System

Each API specification is compiled into five documentation tiers that span the spectrum from no documentation to the most verbose original format:

TierDescriptionTypical SizeFormat Notes
none No documentation provided (prior-knowledge baseline) 0 tokens Agent must rely solely on training data
pretty Original format with full whitespace and comments Baseline (1x) YAML/JSON/GraphQL/Proto as-downloaded from source
minified Whitespace removed, comments stripped ~50-95% of pretty Machine-readable, hard for humans to read
lap-standard LAP structured format with endpoint descriptions and parameter types ~10-50% of pretty Structured text block, human and AI readable
lap-lean LAP format with types only - no descriptions, no examples ~5-30% of pretty Maximum compression while preserving endpoint schema

Scoring Rubric

Each run is scored on a 0-1 scale using three weighted components:

ComponentWeightWhat it Measures
Endpoint Identification (EP)60%Correct API endpoint path and method identified and used
Parameter Accuracy (Par)30%Required and optional parameters present with correct types
Code Quality (Code)10%Executable Python code in response with endpoints and params in code blocks

Total score = 0.6 * endpoint + 0.3 * params + 0.1 * code

Confounding Variable Mitigations

The benchmark applies several controls to isolate documentation quality as the independent variable:

Spec Selection - Full Benchmark (50 specs)

The full benchmark covers 50 real-world production APIs, 10 per format, chosen for diversity of size, domain, and schema complexity:

FormatCountSpecs
OPENAPI10box, digitalocean, figma, github-rest, plaid, resend, slack, spotify, stripe, twilio
ASYNCAPI10adeo-kafka, correlation-id, gitter-streaming, kraken-websocket, operation-security, rpc-server, slack-rtm, social-media, streetlights, websocket-gemini
GRAPHQL10artsy-gql, coral-gql, elastic-gql, github-gql, linear-gql, saleor-gql, shopify-gql, swapi-gql, unraid-gql, yelp-gql
POSTMAN10adobe-postman, akeneo-postman, auth0-postman, azure-devops-postman, braintree-postman, influxdb-postman, postman-echo, sap-postman, stripe-postman, twilio-postman
PROTOBUF10google-billing, google-datacatalog, google-firestore, google-language, google-pubsub, google-spanner, google-storage, google-talent, google-translate, google-vision

Run Statistics

The full benchmark executed 500 runs (50 specs x 5 tiers x 2 tasks). All 500 runs completed successfully (100%) with a 360-second execution time limit per run. This complete dataset provides full coverage across all spec-tier-task combinations with no missing data points.

3 Results - Tier Comparison

Average scores per documentation tier across all 500 completed runs.

TierRuns (done/total) Avg ScoreEndpointParamsCode Avg TimeAvg Cost
None 100/100 0.404 0.357 0.444 0.515 52.5s $0.1307
Pretty 100/100 0.825 0.777 0.902 0.857 100.9s $0.3644
Minified 100/100 0.835 0.792 0.896 0.856 105.6s $0.3186
LAP-Std 100/100 0.840 0.800 0.904 0.865 75.2s $0.2556
LAP-Lean 100/100 0.851 0.823 0.899 0.863 71.5s $0.2375

Average Score per Tier

None
Pretty
Minified
LAP-Std
LAP-Lean
None
0.404
Pretty
0.825
Minified
0.835
LAP-Std
0.840
LAP-Lean
0.851
Key insight: The none tier scores 0.404 on average, while LAP-Lean achieves 0.851 - a +0.448 improvement from documentation alone. This pattern holds across all 5 spec formats and 50 APIs tested, confirming documentation as the dominant factor in agent endpoint identification accuracy.

4 Results - Compression Analysis

How much does each tier compress the documentation compared to the pretty (original) baseline? Doc tokens are the tokens in the documentation file delivered to the agent. Score efficiency = avg score / (avg doc tokens / 1000). Averages across all 50 specs and both tasks.

TierAvg Doc TokensCompression Ratio Token SavingsAvg ScoreScore / 1K Doc Tokens
None 0 N/A N/A 0.404 N/A (no doc)
Pretty 126.6K 1.00x 0.0% 0.825 0.007
Minified 110.6K 0.87x 12.7% 0.835 0.008
LAP-Std 22.9K 0.18x 82.0% 0.840 0.037
LAP-Lean 14.9K 0.12x 88.3% 0.851 0.057

Documentation Size by Tier (avg doc tokens, all 50 specs)

Pretty
126.6K tokens
Minified
110.6K tokens
LAP-Std
22.9K tokens
LAP-Lean
14.9K tokens
Compression efficiency: LAP-Lean achieves ~88% documentation token savings vs the pretty baseline, while LAP-Standard achieves ~82% savings. Minification alone saves only ~13% -- far less than LAP format compression. Both LAP tiers achieve scores equal to or above the pretty baseline (no statistically significant difference was detected), delivering far superior score-per-token efficiency.

5 Results - Cost and Efficiency

Per-run cost, execution time, and token consumption by tier. Savings are calculated relative to the pretty (original format) baseline. The highlighted row (LAP-Lean) represents the recommended production tier. Total benchmark cost across all 500 completed runs: $130.68.

Tier Avg CostCost Savings Avg TimeTime Savings Avg TokensToken Savings Cost Saved / Run
None $0.1307 +64.1% 52.5s +47.9% 26.6K +89.1% $0.2337
Pretty $0.3644 +0.0% 100.9s +0.0% 243.0K +0.0% -
Minified $0.3186 +12.6% 105.6s -4.6% 170.2K +29.9% $0.0458
LAP-Std $0.2556 +29.9% 75.2s +25.5% 139.5K +42.6% $0.1088
LAP-Lean $0.2375 +34.8% 71.5s +29.2% 117.2K +51.8% $0.1269

Average Cost per Run by Tier

None
0.131
Pretty
0.364
Minified
0.319
LAP-Std
0.256
LAP-Lean
0.237
ROI analysis: Switching from the pretty tier to LAP-Lean saves approximately 35% per run. At scale across thousands of agent invocations, this translates to substantial infrastructure savings with minimal impact on task quality. The none tier costs least but scores worst - documentation cost is an excellent investment.

6 Results - Wall Time Analysis

Execution wall time per run by tier and format. Lower times indicate faster agent completion. Wall time includes the full agent execution cycle (prompt processing, inference, code generation).

Total Wall Time
11.3h
40572s across 500 runs
Avg per Run
81.1s
Across all tiers
LAP-Lean vs Pretty
29%
29.4s faster per run
Fastest Tier
None
52.5s avg
TiernAvgMedianMinMax Std DevTotal% vs Pretty
None 100 52.5s 51.8s 28.3s 87.8s 10.2s 5254s +47.9%
Pretty 100 100.9s 97.3s 36.6s 214.3s 37.9s 10091s -
Minified 100 105.6s 83.1s 37.3s 348.5s 66.7s 10560s -4.6%
LAP-Std 100 75.2s 64.6s 32.2s 176.4s 30.7s 7518s +25.5%
LAP-Lean 100 71.5s 61.8s 31.1s 196.1s 28.6s 7149s +29.2%

Average Wall Time per Run by Tier

None
52.5s
Pretty
100.9s
Minified
105.6s
LAP-Std
75.2s
LAP-Lean
71.5s

Average Wall Time by Format and Tier

FormatNonePrettyMinifiedLAP-StdLAP-Lean
OPENAPI49.4s109.5s111.8s91.9s74.3s
ASYNCAPI54.1s61.3s58.9s55.2s54.7s
GRAPHQL47.5s120.8s110.4s99.6s102.7s
POSTMAN62.4s100.1s183.0s70.8s66.1s
PROTOBUF49.4s112.8s63.9s58.5s59.7s
Wall time insights: The none tier averages 52.5s (fastest, no doc to process), while LAP-Lean averages 71.5s - only 19.0s more despite providing full endpoint schema information. LAP-Lean runs 29% faster than Pretty (71.5s vs 100.9s), saving 29.4s per run on average.

7 Results - Spec-Level Score Heatmap (50 Specs)

Average score per (spec, tier) pair, averaged across t1 and t2 tasks. Color coding indicates performance level. Missing cells ("-") indicate no completed runs for that combination (timed out).

Green: score ≥ 0.9 Yellow: ≥ 0.7 Orange: ≥ 0.5 Red: < 0.5
SpecFormatNonePrettyMinifiedLAP-StdLAP-Lean
boxopenapi0.6321.0000.9960.9961.000
digitaloceanopenapi0.2841.0001.0001.0001.000
figmaopenapi0.4820.9490.9790.9660.979
github-restopenapi1.0001.0001.0001.0001.000
plaidopenapi1.0001.0001.0000.9751.000
resendopenapi0.4460.9760.9970.9970.997
slackopenapi1.0001.0001.0001.0001.000
spotifyopenapi1.0001.0001.0001.0001.000
stripeopenapi0.5480.9760.9960.9960.976
twilioopenapi0.9860.9830.9890.9830.983
adeo-kafkaasyncapi0.0770.5100.6700.5560.490
correlation-idasyncapi0.0200.8450.6000.9150.915
gitter-streamingasyncapi0.7620.9180.9560.8880.956
kraken-websocketasyncapi0.4031.0001.0000.9420.782
operation-securityasyncapi0.1770.5180.5180.4840.484
rpc-serverasyncapi0.2900.3780.3780.3780.378
slack-rtmasyncapi0.5000.6230.6230.6230.633
social-mediaasyncapi0.0700.7720.9221.0000.998
streetlightsasyncapi0.2241.0000.8251.0001.000
websocket-geminiasyncapi0.4600.9900.9900.7960.796
artsy-gqlgraphql0.2590.7140.7140.7320.873
coral-gqlgraphql0.0510.8031.0001.0000.850
elastic-gqlgraphql0.3600.8500.8400.8500.840
github-gqlgraphql0.6290.8850.8850.8850.885
linear-gqlgraphql0.5670.7830.7830.6330.633
saleor-gqlgraphql0.2930.8250.9330.6970.933
shopify-gqlgraphql0.0201.0001.0001.0001.000
swapi-gqlgraphql0.2630.8550.8300.8550.830
unraid-gqlgraphql0.0201.0001.0001.0001.000
yelp-gqlgraphql0.4870.9170.9170.9340.934
adobe-postmanpostman0.4860.9700.9820.9700.984
akeneo-postmanpostman0.6661.0000.7971.0001.000
auth0-postmanpostman0.4871.0001.0001.0001.000
azure-devops-postmanpostman0.0791.0001.0001.0001.000
braintree-postmanpostman0.0200.3600.3600.3600.190
influxdb-postmanpostman0.8611.0000.9761.0000.976
postman-echopostman0.6800.8470.8280.9981.000
sap-postmanpostman0.0680.6090.5001.0001.000
stripe-postmanpostman0.8880.8400.9760.9250.925
twilio-postmanpostman0.0621.0001.0001.0001.000
google-billingprotobuf0.2250.3100.3100.6800.990
google-datacatalogprotobuf0.5470.6540.9880.6750.675
google-firestoreprotobuf0.2570.7180.5500.7420.732
google-languageprotobuf0.4200.5180.7140.4830.483
google-pubsubprotobuf0.2120.7980.7240.8040.995
google-spannerprotobuf0.3040.7340.9430.8000.987
google-storageprotobuf0.0910.8870.8490.8690.905
google-talentprotobuf0.0960.6500.9660.6550.786
google-translateprotobuf0.0200.6960.6710.6430.489
google-visionprotobuf0.4060.5700.2550.3050.311

8 Results - Format Comparison (5 Formats)

Performance comparison across all five API specification formats. Each format contains 10 specs. The benchmark covers OpenAPI (REST), AsyncAPI (event-driven), GraphQL (query language), Postman (collection format), and Protobuf (binary protocol).

FormatNone
avg score (n)
Pretty
avg score (n)
Minified
avg score (n)
LAP-Std
avg score (n)
LAP-Lean
avg score (n)
OPENAPI0.738 (n=20)0.988 (n=20)0.996 (n=20)0.991 (n=20)0.993 (n=20)
ASYNCAPI0.298 (n=20)0.755 (n=20)0.748 (n=20)0.758 (n=20)0.743 (n=20)
GRAPHQL0.295 (n=20)0.863 (n=20)0.890 (n=20)0.859 (n=20)0.878 (n=20)
POSTMAN0.430 (n=20)0.863 (n=20)0.842 (n=20)0.925 (n=20)0.907 (n=20)
PROTOBUF0.258 (n=20)0.654 (n=20)0.697 (n=20)0.666 (n=20)0.735 (n=20)

OpenAPI

None
0.738
Pretty
0.988
Minified
0.996
LAP-Std
0.991
LAP-Lean
0.993

AsyncAPI

None
0.298
Pretty
0.755
Minified
0.748
LAP-Std
0.758
LAP-Lean
0.743

GraphQL

None
0.295
Pretty
0.863
Minified
0.890
LAP-Std
0.859
LAP-Lean
0.878

Postman

None
0.430
Pretty
0.863
Minified
0.842
LAP-Std
0.925
LAP-Lean
0.907

Protobuf

None
0.258
Pretty
0.654
Minified
0.697
LAP-Std
0.666
LAP-Lean
0.735
Format observation: OPENAPI peaks at 0.996 (Minified); ASYNCAPI peaks at 0.758 (LAP-Std); GRAPHQL peaks at 0.890 (Minified); POSTMAN peaks at 0.925 (LAP-Std); PROTOBUF peaks at 0.735 (LAP-Lean). GraphQL and Protobuf formats show the largest variability - these formats often have very large pretty-printed files that stress the context window, but their LAP-format representations are compact and well-structured. AsyncAPI event-driven APIs show somewhat lower absolute scores because the 'endpoint' concept maps differently to channel/operation pairs vs REST HTTP methods.

9 Results - Task Difficulty Comparison (t1 vs t2)

Each spec has two tasks (t1 and t2), both phrased in business language to avoid endpoint-revealing technical terms. This section examines whether task difficulty varies systematically between the two task slots across all 50 specs and 5 tiers.

TaskNone
avg score (n)
Pretty
avg score (n)
Minified
avg score (n)
LAP-Std
avg score (n)
LAP-Lean
avg score (n)
Overall Avg
t10.431 (n=50)0.820 (n=50)0.819 (n=50)0.834 (n=50)0.851 (n=50)0.751
t20.377 (n=50)0.829 (n=50)0.850 (n=50)0.845 (n=50)0.851 (n=50)0.751
Task difficulty finding: t1 averages 0.751 overall while t2 averages 0.751 - a difference of only 0.000 points. Tasks are approximately equally difficult across both task slots, suggesting the task assignment process was balanced. The none-tier gap between tasks is slightly larger, indicating some task-specific prior knowledge variance when no documentation is provided.

10 Results - Code Quality Analysis

Code quality (10% of total score) measures whether the agent produced executable Python code containing the correct endpoints and parameters. Agents with documentation consistently produce better-structured code across all 50 specs.

TierAvg Code ScoreEndpoint ScoreParam ScoreOverall Score
None 0.515 0.357 0.444 0.404
Pretty 0.857 0.777 0.902 0.825
Minified 0.856 0.792 0.896 0.835
LAP-Std 0.865 0.800 0.904 0.840
LAP-Lean 0.863 0.823 0.899 0.851

Code Quality Score by Tier

None
0.515
Pretty
0.857
Minified
0.856
LAP-Std
0.865
LAP-Lean
0.863
Code quality finding: The no-doc baseline achieves an average code score of 0.515, while LAP-Std achieves 0.865. Documentation not only improves endpoint identification but also leads to higher-quality, more structured code output from the agent. This pattern is consistent across all 5 formats and all 50 API specs tested.

11 Results - Score Distribution

Score range, central tendency, and variability per tier across all 500 completed runs. "% Perfect" = runs scoring exactly 1.0. "% Good" = runs scoring 0.7 or above. The sparkline shows a sample of individual run scores sorted ascending.

TiernMinMaxMeanMedian Std Dev% Perfect% GoodScoresDelta vs None
None 100 0.020 1.000 0.404 0.301 0.351 10% 28%
Pretty 100 0.218 1.000 0.825 0.953 0.233 38% 73%
+42.1%
Minified 100 0.218 1.000 0.835 0.964 0.230 35% 75%
+43.1%
LAP-Std 100 0.263 1.000 0.840 0.970 0.225 41% 77%
+43.6%
LAP-Lean 100 0.020 1.000 0.851 0.984 0.226 40% 79%
+44.8%
Distribution highlight: The no-doc baseline has a mean score of 0.404 while LAP-Lean achieves 0.851. The gap of +0.448 is driven primarily by endpoint identification: without documentation, agents default to guessing endpoints from their training knowledge, which is unreliable for non-famous or domain-specific APIs. The documentation gap (none vs documented tiers) is statistically significant (p << 0.001); differences between documented tiers are directional only (see Section 12).

12 Statistical Analysis

12.1 Descriptive Statistics

With approximately 100 runs per tier (50 specs x 2 tasks), standard error and 95% confidence intervals are provided for transparency.

TiernMean ScoreStd DevStd Error95% CI
None 100 0.404 0.351 0.035 ±0.069
Pretty 100 0.825 0.233 0.023 ±0.046
Minified 100 0.835 0.230 0.023 ±0.045
LAP-Std 100 0.840 0.225 0.023 ±0.044
LAP-Lean 100 0.851 0.226 0.023 ±0.044
Important: 95% confidence intervals for all four documented tiers overlap completely. This means no pairwise tier comparison among documented tiers reaches conventional statistical significance (p < 0.05). The large standard deviations (~0.23) reflect genuine variance across API complexity, not measurement noise. Each condition was tested once (n=1 per spec-tier-task); per-spec differences should be interpreted with caution.

12.2 Paired t-Tests: Task Score

Paired t-tests (df=49, two-tailed) comparing mean score per spec across tiers. Each spec contributes one data point (mean of t1 and t2). Cohen's d measures effect size.

ComparisonMean Difft-statnCohen's dSig
Pretty vs None +0.4209 10.302 50 1.457 ***
LAP-Lean vs None +0.4477 9.892 50 1.399 ***
Minified vs Pretty +0.0099 0.591 50 0.084 ns
LAP-Std vs Pretty +0.0152 0.998 50 0.141 ns
LAP-Lean vs Pretty +0.0268 1.278 50 0.181 ns
LAP-Lean vs LAP-Std +0.0116 0.948 50 0.134 ns
Key finding: The ONLY statistically significant score comparisons are none vs documented tiers (p << 0.001). All comparisons between documented tiers (pretty vs minified, pretty vs LAP-Std, pretty vs LAP-Lean, LAP-Std vs LAP-Lean) fail to reach significance. The real story is: any documentation >> no documentation.

12.3 Paired t-Tests: Wall Time

Paired t-tests for wall time (seconds). Positive values mean the second tier is slower.

ComparisonMean Difft-statnSig
Pretty vs None +48.4s 9.180 50 ***
LAP-Lean vs None +19.0s 4.739 50 ***
Minified vs Pretty +4.7s 0.557 50 ns
LAP-Std vs Pretty -25.7s -5.860 50 ***
LAP-Lean vs Pretty -29.4s -6.712 50 ***
LAP-Lean vs LAP-Std -3.7s -1.985 50 ns

12.4 Paired t-Tests: Cost (USD)

Paired t-tests for per-run cost. Negative values mean the second tier is cheaper.

ComparisonMean Difft-statnSig
Pretty vs None +0.2337 9.714 50 ***
LAP-Lean vs None +0.1068 5.388 50 ***
Minified vs Pretty -0.0458 -1.991 50 ns
LAP-Std vs Pretty -0.1088 -4.538 50 ***
LAP-Lean vs Pretty -0.1269 -5.688 50 ***
LAP-Lean vs LAP-Std -0.0181 -2.154 50 *

12.5 Paired t-Tests: Total Tokens

Paired t-tests for total tokens consumed per run. Negative values mean the second tier uses fewer tokens.

ComparisonMean Difft-statnSig
Pretty vs None 216.4K 7.780 50 ***
LAP-Lean vs None 90.6K 5.039 50 ***
Minified vs Pretty -72734 -2.877 50 **
LAP-Std vs Pretty -103444 -3.524 50 ***
LAP-Lean vs Pretty -125765 -4.683 50 ***
LAP-Lean vs LAP-Std -22320 -2.008 50 ns
Efficiency finding: While score differences between documented tiers are NOT significant, wall time and token differences ARE. LAP-Lean vs Pretty shows highly significant reductions in wall time (p < 0.001), cost (p < 0.001), and total tokens (p < 0.001). This is the report's strongest defensible claim: equivalent quality at significantly lower cost.

12.6 Documentation Lift (Tier Score - None Score)

How much does each tier improve over the no-documentation baseline, per spec? This isolates documentation's contribution from prior knowledge.

TierMean LiftMedian LiftMinMaxStd Dev
Pretty +0.421 +0.444 -0.048 +0.980 0.289
Minified +0.431 +0.452 -0.151 +0.980 0.300
LAP-Std +0.436 +0.464 -0.101 +0.980 0.313
LAP-Lean +0.448 +0.458 -0.095 +0.980 0.320

13 Results - Documentation-Sensitive Subset Analysis

Some APIs score ≥ 0.9 across all tiers including the no-documentation baseline, indicating the model has strong prior knowledge. These "ceiling-effect" specs contribute 1.0 to every tier's average, compressing observed differences and measuring API familiarity rather than documentation quality. This section excludes 5 such specs to isolate the documentation signal.

Excluded specs (none-tier ≥ 0.9): github-rest, plaid, slack, spotify, twilio

Full Set vs Documentation-Sensitive Subset (45 specs)

TierFull Set (50 specs)Sensitive Subset (45 specs)Delta
None 0.404 0.338 -0.066
Pretty 0.825 0.806 -0.019
Minified 0.835 0.816 -0.018
LAP-Std 0.840 0.823 -0.017
LAP-Lean 0.851 0.835 -0.016

Paired t-Tests on Sensitive Subset

ComparisonMean Difft-statnSig
Pretty vs None +0.4677 11.816 45 ***
LAP-Lean vs None +0.4976 11.208 45 ***
LAP-Lean vs Pretty +0.0298 1.279 45 ns
Subset insight: When ceiling-effect specs are excluded, the none-tier score drops further (revealing the true documentation gap), while documented-tier ordering remains similar. The documentation-sensitive subset shows the real signal: documentation matters most for APIs the model does not already know well.

14 Limitations

This benchmark has several methodological limitations that should be considered when interpreting results:

LimitationImpactMitigation
N=1 per condition Each (spec, tier, task) ran exactly once. LLM outputs are stochastic; observed per-spec differences may be sampling noise. Aggregate tier comparisons (n=100 per tier) are more reliable. Future: n≥3 per condition.
Fixed tier order Tiers executed in fixed order (none, pretty, minified, lap-standard, lap-lean). Later tiers could benefit from model-level caching. Double-UUID isolation prevents filesystem contamination. Future: randomize tier order.
Single model Only Claude Sonnet 4.5 was tested. Other models may respond differently to compression. Future: cross-model validation with GPT-4o, Gemini, Claude Opus.
Recall-only endpoint scoring Endpoint score is recall-based (correct_hits / expected_count). No penalty for false positives. Verbose agents may be overly rewarded. Future: add F1-based metric with precision penalty.
WebFetch available in none-tier Agents could theoretically fetch API docs from the web during none-tier runs, breaking the prior-knowledge assumption. Future: restrict WebFetch tool for none-tier runs. Verify via session recordings.
Concurrency=3 Benchmark ran with ThreadPoolExecutor(max_workers=3). Non-deterministic execution order means results are not perfectly reproducible. Future: use concurrency=1 for scientific runs.
Ceiling-effect specs ~8 specs score ≥0.9 across ALL tiers including none, compressing observed tier differences. Section 13 provides a documentation-sensitive subset analysis excluding these specs.

15 Discussion

LAP-Lean: Best Efficiency Tier

LAP-Lean achieves the numerically highest average score of 0.851 compared to the pretty baseline (0.825), though this difference (+0.027) does not reach statistical significance (paired t-test, p > 0.05). The significant finding is efficiency: LAP-Lean delivers approximately 88% documentation token reduction and ~35% inference cost savings per run (p < 0.001 for both wall time and token reduction). For production AI coding workflows where agents are invoked at scale, LAP-Lean offers equivalent quality at significantly lower cost and latency. This result holds across all 5 formats and 50 APIs.

No-Doc Baseline: Documentation Is Essential

The no-documentation baseline scores 0.404 on average - substantially below all documented tiers across all 5 formats tested. The primary failure mode is endpoint identification (EP score): without documentation, agents cannot reliably identify the correct API endpoint path for unfamiliar or domain-specific APIs. This confirms that documentation is not merely helpful but essential for reliable AI agent performance. The none-tier effect is especially pronounced for Protobuf, AsyncAPI, and less well-known APIs where the model has limited training signal.

Minification Provides Limited Benefit Over Pretty

The minified tier achieves 0.835 vs 0.825 for pretty -- a negligible difference that does not reach statistical significance. Despite removing whitespace, minification of YAML/JSON/GraphQL does not meaningfully reduce semantic token count because structural keywords, field names, and values remain unchanged. LAP-format representations are fundamentally different: they reorganize information into a denser, agent-optimized structure rather than simply removing whitespace.

LAP-Standard: Intermediate Trade-off

LAP-Standard achieves 0.840 - essentially identical to LAP-Lean at 0.851 - while using approximately 82% fewer tokens than the pretty baseline. The near-identical scores between LAP-Standard and LAP-Lean suggest that for the tested task types, type information alone (LAP-Lean) is sufficient for correct task completion. However, LAP-Standard may be preferable for agents working with unfamiliar APIs where natural language endpoint descriptions provide additional disambiguation context, or for more complex multi-step tasks.

Cross-Format Robustness

The LAP format compression benefit is consistent across all 5 tested specification formats: OpenAPI (REST), AsyncAPI (event-driven), GraphQL (query language), Postman (collection), and Protobuf (binary protocol). This cross-format robustness is significant because each format has a fundamentally different structure and information organization. The LAP converter successfully normalizes all formats into a common endpoint-centric representation that preserves the information an AI agent needs for task completion.

16 Conclusion

This full benchmark (500 runs across 50 APIs, 5 formats, 5 tiers, 2 tasks) yields two clear findings:

1. Documentation is essential. Providing any form of API documentation dramatically improves agent performance (mean documented score ~0.84 vs none-tier 0.40, p << 0.001). This holds across all 5 specification formats. Agents cannot reliably identify API endpoints from training knowledge alone, particularly for domain-specific or less-prominent APIs.

2. Compression preserves quality while significantly reducing cost. No statistically significant quality difference was detected between any documented tier (paired t-tests, all p > 0.05). LAP-Lean achieved the numerically highest score (0.851 vs pretty 0.825), but this difference is directional only. The statistically significant advantages of LAP-Lean are efficiency: 88% documentation token reduction, 35% cost savings (p < 0.001), and 29s faster per run (p < 0.001).

Recommended Production Configuration

Future Work

A Appendix A: Individual Runs

All 500 individual runs, sorted by format, spec, tier, task. All runs completed successfully.

Show all 500 individual runs (click to expand)
SpecFormatTierTaskStatus ScoreEPParCode TimeCostTotal TokensDoc TokensTurns
box openapi None t1 completed 0.567 0.500 0.667 0.667 52.3s $0.1390 24.4K 0 1
box openapi None t2 completed 0.698 1.000 0.125 0.600 62.2s $0.1509 24.9K 0 1
box openapi Pretty t1 completed 1.000 1.000 1.000 1.000 99.9s $0.3700 226.4K 349.6K 12
box openapi Pretty t2 completed 1.000 1.000 1.000 1.000 111.2s $0.4023 207.1K 349.6K 13
box openapi Minified t1 completed 0.993 1.000 1.000 0.933 100.4s $0.3039 181.4K 277.9K 11
box openapi Minified t2 completed 1.000 1.000 1.000 1.000 109.1s $0.4198 260.5K 277.9K 12
box openapi LAP-Std t1 completed 0.993 1.000 1.000 0.933 117.9s $0.5256 448.4K 69.3K 14
box openapi LAP-Std t2 completed 1.000 1.000 1.000 1.000 94.7s $0.3845 209.6K 69.3K 7
box openapi LAP-Lean t1 completed 1.000 1.000 1.000 1.000 106.2s $0.3145 221.2K 19.8K 11
box openapi LAP-Lean t2 completed 1.000 1.000 1.000 1.000 81.3s $0.2735 155.1K 19.8K 7
digitalocean openapi None t1 completed 0.279 0.000 0.429 0.429 40.3s $0.1245 38.6K 0 1
digitalocean openapi None t2 completed 0.290 0.000 0.500 0.400 41.6s $0.1326 39.0K 0 1
digitalocean openapi Pretty t1 completed 1.000 1.000 1.000 1.000 150.3s $0.7735 887.9K 22.6K 19
digitalocean openapi Pretty t2 completed 1.000 1.000 1.000 1.000 122.2s $0.7965 785.8K 22.6K 15
digitalocean openapi Minified t1 completed 1.000 1.000 1.000 1.000 179.0s $0.6413 451.9K 22.0K 10
digitalocean openapi Minified t2 completed 1.000 1.000 1.000 1.000 286.7s $0.8342 715.8K 22.0K 18
digitalocean openapi LAP-Std t1 completed 1.000 1.000 1.000 1.000 53.6s $0.2661 91.6K 9.1K 2
digitalocean openapi LAP-Std t2 completed 1.000 1.000 1.000 1.000 37.8s $0.2366 90.4K 9.1K 2
digitalocean openapi LAP-Lean t1 completed 1.000 1.000 1.000 1.000 55.6s $0.2782 92.1K 9.1K 2
digitalocean openapi LAP-Lean t2 completed 1.000 1.000 1.000 1.000 43.9s $0.2402 90.6K 9.1K 2
figma openapi None t1 completed 0.408 0.500 0.200 0.480 61.3s $0.1392 24.4K 0 1
figma openapi None t2 completed 0.555 0.500 0.545 0.618 47.1s $0.1410 39.3K 0 1
figma openapi Pretty t1 completed 0.898 1.000 0.800 0.880 65.1s $0.2779 202.5K 72.5K 7
figma openapi Pretty t2 completed 1.000 1.000 1.000 1.000 100.3s $0.5377 435.2K 72.5K 13
figma openapi Minified t1 completed 0.958 1.000 0.900 0.880 92.1s $0.3752 299.7K 71.4K 9
figma openapi Minified t2 completed 1.000 1.000 1.000 1.000 83.5s $0.4866 256.3K 71.4K 7
figma openapi LAP-Std t1 completed 0.932 1.000 0.800 0.920 93.4s $0.2839 154.3K 14.1K 8
figma openapi LAP-Std t2 completed 1.000 1.000 1.000 1.000 56.9s $0.2581 92.1K 14.1K 2
figma openapi LAP-Lean t1 completed 0.958 1.000 1.000 0.880 47.8s $0.1911 81.7K 4.2K 2
figma openapi LAP-Lean t2 completed 1.000 1.000 1.000 1.000 66.0s $0.2244 83.1K 4.2K 2
github-rest openapi None t1 completed 1.000 1.000 1.000 1.000 55.5s $0.1353 24.3K 0 1
github-rest openapi None t2 completed 1.000 1.000 1.000 1.000 44.3s $0.1198 23.7K 0 1
github-rest openapi Pretty t1 completed 1.000 1.000 1.000 1.000 121.2s $0.3827 259.2K 1.80M 12
github-rest openapi Pretty t2 completed 1.000 1.000 1.000 1.000 129.8s $0.5117 420.4K 1.80M 17
github-rest openapi Minified t1 completed 1.000 1.000 1.000 1.000 89.7s $0.2922 158.2K 1.80M 8
github-rest openapi Minified t2 completed 1.000 1.000 1.000 1.000 117.5s $0.4979 375.2K 1.80M 16
github-rest openapi LAP-Std t1 completed 1.000 1.000 1.000 1.000 96.9s $0.4089 194.4K 331.5K 9
github-rest openapi LAP-Std t2 completed 1.000 1.000 1.000 1.000 116.0s $0.5024 360.0K 331.5K 13
github-rest openapi LAP-Lean t1 completed 1.000 1.000 1.000 1.000 87.1s $0.2922 127.5K 201.7K 7
github-rest openapi LAP-Lean t2 completed 1.000 1.000 1.000 1.000 120.7s $0.4947 290.5K 201.7K 12
plaid openapi None t1 completed 1.000 1.000 1.000 1.000 48.9s $0.1254 23.9K 0 1
plaid openapi None t2 completed 1.000 1.000 1.000 1.000 54.4s $0.1235 23.8K 0 1
plaid openapi Pretty t1 completed 1.000 1.000 1.000 1.000 160.3s $0.5964 438.7K 595.5K 16
plaid openapi Pretty t2 completed 1.000 1.000 1.000 1.000 105.7s $0.4636 266.9K 595.5K 14
plaid openapi Minified t1 completed 1.000 1.000 1.000 1.000 118.5s $0.5099 323.5K 594.9K 13
plaid openapi Minified t2 completed 1.000 1.000 1.000 1.000 104.3s $0.4002 179.2K 594.9K 10
plaid openapi LAP-Std t1 completed 1.000 1.000 1.000 1.000 108.8s $0.3197 173.8K 78.5K 9
plaid openapi LAP-Std t2 completed 0.950 1.000 0.833 1.000 101.9s $0.3920 157.2K 78.5K 10
plaid openapi LAP-Lean t1 completed 1.000 1.000 1.000 1.000 98.7s $0.3328 194.5K 34.2K 7
plaid openapi LAP-Lean t2 completed 1.000 1.000 1.000 1.000 88.7s $0.3603 166.3K 34.2K 9
resend openapi None t1 completed 0.311 0.000 0.857 0.543 39.8s $0.2795 38.5K 0 1
resend openapi None t2 completed 0.580 0.500 0.667 0.800 54.9s $0.1328 24.2K 0 1
resend openapi Pretty t1 completed 0.951 1.000 0.857 0.943 57.4s $0.3122 282.4K 21.9K 7
resend openapi Pretty t2 completed 1.000 1.000 1.000 1.000 67.9s $0.4243 350.1K 21.9K 11
resend openapi Minified t1 completed 0.994 1.000 1.000 0.943 52.4s $0.2892 246.6K 21.8K 6
resend openapi Minified t2 completed 1.000 1.000 1.000 1.000 86.6s $0.3479 355.1K 21.8K 9
resend openapi LAP-Std t1 completed 0.994 1.000 1.000 0.943 32.3s $0.1779 82.6K 6.1K 2
resend openapi LAP-Std t2 completed 1.000 1.000 1.000 1.000 50.5s $0.1884 83.0K 6.1K 2
resend openapi LAP-Lean t1 completed 0.994 1.000 1.000 0.943 31.7s $0.1612 80.0K 3.8K 2
resend openapi LAP-Lean t2 completed 1.000 1.000 1.000 1.000 40.3s $0.1705 80.4K 3.8K 2
slack openapi None t1 completed 1.000 1.000 1.000 1.000 58.0s $0.1348 24.3K 0 1
slack openapi None t2 completed 1.000 1.000 1.000 1.000 47.1s $0.1126 23.4K 0 1
slack openapi Pretty t1 completed 1.000 1.000 1.000 1.000 97.3s $0.3162 137.0K 167.3K 8
slack openapi Pretty t2 completed 1.000 1.000 1.000 1.000 87.6s $0.2937 157.2K 167.3K 7
slack openapi Minified t1 completed 1.000 1.000 1.000 1.000 89.6s $0.2934 178.5K 124.1K 7
slack openapi Minified t2 completed 1.000 1.000 1.000 1.000 85.5s $0.2674 151.1K 124.1K 6
slack openapi LAP-Std t1 completed 1.000 1.000 1.000 1.000 98.1s $0.2865 150.8K 19.5K 10
slack openapi LAP-Std t2 completed 1.000 1.000 1.000 1.000 116.2s $0.4043 317.4K 19.5K 14
slack openapi LAP-Lean t1 completed 1.000 1.000 1.000 1.000 61.5s $0.2358 60.2K 8.2K 2
slack openapi LAP-Lean t2 completed 1.000 1.000 1.000 1.000 52.5s $0.2238 59.7K 8.2K 2
spotify openapi None t1 completed 1.000 1.000 1.000 1.000 55.9s $0.1315 24.1K 0 1
spotify openapi None t2 completed 1.000 1.000 1.000 1.000 52.2s $0.1366 24.3K 0 1
spotify openapi Pretty t1 completed 1.000 1.000 1.000 1.000 145.8s $0.4908 393.2K 64.4K 15
spotify openapi Pretty t2 completed 1.000 1.000 1.000 1.000 88.5s $0.3625 217.0K 64.4K 10
spotify openapi Minified t1 completed 1.000 1.000 1.000 1.000 123.8s $0.4856 378.9K 64.9K 12
spotify openapi Minified t2 completed 1.000 1.000 1.000 1.000 110.9s $0.4117 294.1K 64.9K 13
spotify openapi LAP-Std t1 completed 1.000 1.000 1.000 1.000 123.8s $0.3584 237.8K 16.8K 10
spotify openapi LAP-Std t2 completed 1.000 1.000 1.000 1.000 66.0s $0.2042 95.6K 16.8K 5
spotify openapi LAP-Lean t1 completed 1.000 1.000 1.000 1.000 60.3s $0.2014 54.3K 5.7K 2
spotify openapi LAP-Lean t2 completed 1.000 1.000 1.000 1.000 61.4s $0.2027 54.4K 5.7K 2
stripe openapi None t1 completed 0.737 0.500 1.000 0.750 41.5s $0.1288 38.8K 0 1
stripe openapi None t2 completed 0.360 0.000 1.000 0.600 42.4s $0.1302 23.3K 0 1
stripe openapi Pretty t1 completed 0.958 1.000 0.875 0.950 116.6s $0.4883 211.3K 1.03M 15
stripe openapi Pretty t2 completed 0.993 1.000 1.000 0.933 100.5s $0.5315 567.9K 1.03M 13
stripe openapi Minified t1 completed 1.000 1.000 1.000 1.000 110.3s $0.4342 173.9K 983.6K 13
stripe openapi Minified t2 completed 0.993 1.000 1.000 0.933 72.2s $0.3083 174.2K 983.6K 9
stripe openapi LAP-Std t1 completed 1.000 1.000 1.000 1.000 129.1s $0.5779 403.0K 154.3K 17
stripe openapi LAP-Std t2 completed 0.993 1.000 1.000 0.933 74.8s $0.4145 390.7K 154.3K 9
stripe openapi LAP-Lean t1 completed 0.958 1.000 0.875 0.950 94.2s $0.5041 446.7K 118.1K 13
stripe openapi LAP-Lean t2 completed 0.993 1.000 1.000 0.933 65.8s $0.3417 283.8K 118.1K 7
twilio openapi None t1 completed 0.989 1.000 1.000 0.886 39.1s $0.1075 23.2K 0 1
twilio openapi None t2 completed 0.983 1.000 1.000 0.829 48.9s $0.1169 23.5K 0 1
twilio openapi Pretty t1 completed 0.989 1.000 1.000 0.886 97.4s $0.4889 322.1K 382.9K 11
twilio openapi Pretty t2 completed 0.977 1.000 1.000 0.771 165.9s $0.7323 761.4K 382.9K 23
twilio openapi Minified t1 completed 0.989 1.000 1.000 0.886 97.5s $0.3837 203.6K 302.1K 7
twilio openapi Minified t2 completed 0.989 1.000 1.000 0.886 126.6s $0.5387 344.7K 302.1K 13
twilio openapi LAP-Std t1 completed 0.989 1.000 1.000 0.886 117.8s $0.7369 803.7K 42.0K 16
twilio openapi LAP-Std t2 completed 0.977 1.000 1.000 0.771 150.8s $0.8239 945.7K 42.0K 19
twilio openapi LAP-Lean t1 completed 0.989 1.000 1.000 0.886 114.5s $0.4353 321.2K 22.0K 10
twilio openapi LAP-Lean t2 completed 0.977 1.000 1.000 0.771 107.7s $0.5564 578.7K 22.0K 12
adeo-kafka asyncapi None t1 completed 0.077 0.000 0.167 0.267 58.6s $0.1386 24.4K 0 1
adeo-kafka asyncapi None t2 completed 0.077 0.000 0.167 0.267 59.4s $0.1428 24.6K 0 1
adeo-kafka asyncapi Pretty t1 completed 0.510 0.250 1.000 0.600 67.2s $0.1955 50.7K 2.5K 2
adeo-kafka asyncapi Pretty t2 completed 0.510 0.250 1.000 0.600 79.0s $0.2093 51.3K 2.5K 2
adeo-kafka asyncapi Minified t1 completed 0.660 0.500 1.000 0.600 68.2s $0.1910 50.4K 2.4K 2
adeo-kafka asyncapi Minified t2 completed 0.680 0.500 1.000 0.800 70.1s $0.1923 50.5K 2.4K 2
adeo-kafka asyncapi LAP-Std t1 completed 0.603 0.500 0.833 0.533 58.1s $0.1485 46.2K 277 2
adeo-kafka asyncapi LAP-Std t2 completed 0.510 0.500 0.500 0.600 60.8s $0.1575 46.6K 277 2
adeo-kafka asyncapi LAP-Lean t1 completed 0.547 0.500 0.667 0.467 57.7s $0.1508 46.2K 127 2
adeo-kafka asyncapi LAP-Lean t2 completed 0.433 0.500 0.333 0.333 58.3s $0.1547 46.4K 127 2
correlation-id asyncapi None t1 completed 0.020 0.000 0.000 0.200 54.5s $0.1328 24.1K 0 1
correlation-id asyncapi None t2 completed 0.020 0.000 0.000 0.200 87.8s $0.2059 48.3K 0 3
correlation-id asyncapi Pretty t1 completed 1.000 1.000 1.000 1.000 64.6s $0.1563 48.0K 1.4K 2
correlation-id asyncapi Pretty t2 completed 0.690 0.500 1.000 0.900 82.8s $0.2058 50.0K 1.4K 2
correlation-id asyncapi Minified t1 completed 0.700 0.500 1.000 1.000 64.4s $0.1626 48.1K 1.4K 2
correlation-id asyncapi Minified t2 completed 0.500 0.500 0.500 0.500 81.9s $0.2003 49.7K 1.4K 2
correlation-id asyncapi LAP-Std t1 completed 1.000 1.000 1.000 1.000 48.3s $0.1304 45.4K 240 2
correlation-id asyncapi LAP-Std t2 completed 0.830 1.000 0.500 0.800 69.8s $0.1644 46.8K 240 2
correlation-id asyncapi LAP-Lean t1 completed 1.000 1.000 1.000 1.000 53.7s $0.1378 45.7K 167 2
correlation-id asyncapi LAP-Lean t2 completed 0.830 1.000 0.500 0.800 75.2s $0.1700 47.0K 167 2
gitter-streaming asyncapi None t1 completed 0.728 1.000 0.200 0.680 53.0s $0.1975 23.8K 0 1
gitter-streaming asyncapi None t2 completed 0.796 1.000 0.400 0.760 61.1s $0.1459 24.7K 0 1
gitter-streaming asyncapi Pretty t1 completed 0.982 1.000 1.000 0.820 58.7s $0.2151 47.0K 1.0K 2
gitter-streaming asyncapi Pretty t2 completed 0.854 1.000 0.600 0.740 62.6s $0.1764 47.6K 1.0K 2
gitter-streaming asyncapi Minified t1 completed 0.990 1.000 1.000 0.900 55.8s $0.1413 46.9K 1.0K 2
gitter-streaming asyncapi Minified t2 completed 0.922 1.000 0.800 0.820 58.8s $0.1513 47.3K 1.0K 2
gitter-streaming asyncapi LAP-Std t1 completed 0.854 1.000 0.600 0.740 55.0s $0.1337 45.5K 120 2
gitter-streaming asyncapi LAP-Std t2 completed 0.922 1.000 0.800 0.820 75.9s $0.1743 47.1K 120 2
gitter-streaming asyncapi LAP-Lean t1 completed 0.922 1.000 0.800 0.820 60.5s $0.1437 45.8K 74 2
gitter-streaming asyncapi LAP-Lean t2 completed 0.990 1.000 1.000 0.900 75.8s $0.1728 46.9K 74 2
kraken-websocket asyncapi None t1 completed 0.703 1.000 0.167 0.533 50.9s $0.1171 23.5K 0 1
kraken-websocket asyncapi None t2 completed 0.104 0.000 0.222 0.378 58.3s $0.1349 24.2K 0 1
kraken-websocket asyncapi Pretty t1 completed 1.000 1.000 1.000 1.000 65.4s $0.1781 50.5K 2.8K 2
kraken-websocket asyncapi Pretty t2 completed 1.000 1.000 1.000 1.000 69.3s $0.1902 51.0K 2.8K 2
kraken-websocket asyncapi Minified t1 completed 1.000 1.000 1.000 1.000 68.9s $0.1842 50.6K 2.8K 2
kraken-websocket asyncapi Minified t2 completed 1.000 1.000 1.000 1.000 71.2s $0.1944 51.0K 2.8K 2
kraken-websocket asyncapi LAP-Std t1 completed 1.000 1.000 1.000 1.000 44.5s $0.1295 45.5K 440 2
kraken-websocket asyncapi LAP-Std t2 completed 0.884 1.000 0.778 0.511 65.6s $0.1689 47.2K 440 2
kraken-websocket asyncapi LAP-Lean t1 completed 0.680 0.500 1.000 0.800 55.2s $0.1426 46.0K 310 2
kraken-websocket asyncapi LAP-Lean t2 completed 0.884 1.000 0.778 0.511 63.7s $0.1668 47.0K 310 2
operation-security asyncapi None t1 completed 0.292 0.000 0.800 0.520 52.1s $0.1290 24.0K 0 1
operation-security asyncapi None t2 completed 0.062 0.000 0.125 0.250 66.0s $0.1378 24.3K 0 1
operation-security asyncapi Pretty t1 completed 0.360 0.000 1.000 0.600 57.0s $0.1479 46.7K 706 2
operation-security asyncapi Pretty t2 completed 0.675 0.500 1.000 0.750 66.3s $0.1602 47.3K 706 2
operation-security asyncapi Minified t1 completed 0.360 0.000 1.000 0.600 67.6s $0.1548 47.0K 696 2
operation-security asyncapi Minified t2 completed 0.675 0.500 1.000 0.750 64.6s $0.1629 47.4K 696 2
operation-security asyncapi LAP-Std t1 completed 0.292 0.000 0.800 0.520 61.0s $0.1377 45.7K 179 2
operation-security asyncapi LAP-Std t2 completed 0.675 0.500 1.000 0.750 68.8s $0.1523 46.2K 179 2
operation-security asyncapi LAP-Lean t1 completed 0.292 0.000 0.800 0.520 56.1s $0.1388 45.7K 164 2
operation-security asyncapi LAP-Lean t2 completed 0.675 0.500 1.000 0.750 72.8s $0.1529 46.3K 164 2
rpc-server asyncapi None t1 completed 0.205 0.000 0.500 0.550 42.1s $0.0952 22.6K 0 1
rpc-server asyncapi None t2 completed 0.375 0.000 1.000 0.750 55.2s $0.1076 23.1K 0 1
rpc-server asyncapi Pretty t1 completed 0.380 0.000 1.000 0.800 53.8s $0.1312 45.7K 352 2
rpc-server asyncapi Pretty t2 completed 0.375 0.000 1.000 0.750 67.5s $0.1464 46.3K 352 2
rpc-server asyncapi Minified t1 completed 0.380 0.000 1.000 0.800 54.3s $0.1271 45.5K 351 2
rpc-server asyncapi Minified t2 completed 0.375 0.000 1.000 0.750 55.0s $0.1352 45.8K 351 2
rpc-server asyncapi LAP-Std t1 completed 0.380 0.000 1.000 0.800 55.3s $0.1318 45.5K 154 2
rpc-server asyncapi LAP-Std t2 completed 0.375 0.000 1.000 0.750 53.7s $0.1314 45.4K 154 2
rpc-server asyncapi LAP-Lean t1 completed 0.380 0.000 1.000 0.800 50.8s $0.1229 45.0K 125 2
rpc-server asyncapi LAP-Lean t2 completed 0.375 0.000 1.000 0.750 56.5s $0.1324 45.5K 125 2
slack-rtm asyncapi None t1 completed 0.737 0.500 1.000 0.750 37.0s $0.1110 38.1K 0 1
slack-rtm asyncapi None t2 completed 0.263 0.000 0.714 0.486 50.8s $0.1206 23.7K 0 1
slack-rtm asyncapi Pretty t1 completed 0.737 0.500 1.000 0.750 40.5s $0.1943 84.3K 5.1K 2
slack-rtm asyncapi Pretty t2 completed 0.510 0.000 1.000 0.600 49.7s $0.2123 85.0K 5.1K 2
slack-rtm asyncapi Minified t1 completed 0.737 0.500 1.000 0.750 38.9s $0.1924 84.2K 5.1K 2
slack-rtm asyncapi Minified t2 completed 0.510 0.000 1.000 0.600 40.4s $0.2008 84.6K 5.1K 2
slack-rtm asyncapi LAP-Std t1 completed 0.737 0.500 1.000 0.750 47.1s $0.1701 79.3K 2.6K 2
slack-rtm asyncapi LAP-Std t2 completed 0.510 0.000 1.000 0.600 45.9s $0.1806 79.7K 2.6K 2
slack-rtm asyncapi LAP-Lean t1 completed 0.755 0.500 1.000 0.800 45.3s $0.1671 78.6K 2.0K 2
slack-rtm asyncapi LAP-Lean t2 completed 0.510 0.000 1.000 0.600 43.7s $0.1688 78.7K 2.0K 2
social-media asyncapi None t1 completed 0.020 0.000 0.000 0.200 65.0s $0.1428 24.6K 0 1
social-media asyncapi None t2 completed 0.120 0.000 0.333 0.200 46.8s $0.1206 23.7K 0 1
social-media asyncapi Pretty t1 completed 0.695 0.500 1.000 0.950 64.6s $0.1466 46.7K 610 2
social-media asyncapi Pretty t2 completed 0.850 0.750 1.000 1.000 61.2s $0.1483 46.8K 610 2
social-media asyncapi Minified t1 completed 0.845 0.750 1.000 0.950 42.8s $0.1299 45.9K 601 2
social-media asyncapi Minified t2 completed 1.000 1.000 1.000 1.000 57.7s $0.1463 46.7K 601 2
social-media asyncapi LAP-Std t1 completed 1.000 1.000 1.000 1.000 60.0s $0.1510 46.3K 115 2
social-media asyncapi LAP-Std t2 completed 1.000 1.000 1.000 1.000 52.0s $0.1340 45.6K 115 2
social-media asyncapi LAP-Lean t1 completed 0.995 1.000 1.000 0.950 47.5s $0.1363 45.7K 115 2
social-media asyncapi LAP-Lean t2 completed 1.000 1.000 1.000 1.000 57.2s $0.2093 45.6K 115 2
streetlights asyncapi None t1 completed 0.267 0.250 0.250 0.300 38.2s $0.1160 38.3K 0 1
streetlights asyncapi None t2 completed 0.180 0.000 0.250 0.300 31.9s $0.1043 37.8K 0 1
streetlights asyncapi Pretty t1 completed 1.000 1.000 1.000 1.000 55.2s $0.1469 47.8K 1.5K 2
streetlights asyncapi Pretty t2 completed 1.000 1.000 1.000 1.000 36.6s $0.1413 77.1K 1.5K 2
streetlights asyncapi Minified t1 completed 0.825 0.500 1.000 1.000 44.1s $0.1624 77.9K 1.5K 2
streetlights asyncapi Minified t2 completed 0.825 0.500 1.000 1.000 37.3s $0.1439 77.2K 1.5K 2
streetlights asyncapi LAP-Std t1 completed 1.000 1.000 1.000 1.000 32.2s $0.1345 75.6K 596 2
streetlights asyncapi LAP-Std t2 completed 1.000 1.000 1.000 1.000 36.2s $0.1339 75.6K 596 2
streetlights asyncapi LAP-Lean t1 completed 1.000 1.000 1.000 1.000 34.9s $0.1350 75.5K 443 2
streetlights asyncapi LAP-Lean t2 completed 1.000 1.000 1.000 1.000 31.1s $0.1265 75.2K 443 2
websocket-gemini asyncapi None t1 completed 0.033 0.000 0.000 0.333 52.2s $0.1297 24.0K 0 1
websocket-gemini asyncapi None t2 completed 0.887 1.000 0.667 0.867 60.1s $0.1424 24.5K 0 1
websocket-gemini asyncapi Pretty t1 completed 0.990 1.000 1.000 0.900 66.5s $0.1817 49.6K 2.0K 2
websocket-gemini asyncapi Pretty t2 completed 0.990 1.000 1.000 0.900 57.4s $0.1579 48.6K 2.0K 2
websocket-gemini asyncapi Minified t1 completed 0.990 1.000 1.000 0.900 71.7s $0.1867 49.6K 1.9K 2
websocket-gemini asyncapi Minified t2 completed 0.990 1.000 1.000 0.900 64.5s $0.1689 48.9K 1.9K 2
websocket-gemini asyncapi LAP-Std t1 completed 0.820 1.000 0.500 0.700 63.3s $0.1556 46.2K 82 2
websocket-gemini asyncapi LAP-Std t2 completed 0.773 1.000 0.333 0.733 49.8s $0.1246 45.1K 82 2
websocket-gemini asyncapi LAP-Lean t1 completed 0.813 1.000 0.500 0.633 52.5s $0.1361 45.5K 70 2
websocket-gemini asyncapi LAP-Lean t2 completed 0.780 1.000 0.333 0.800 46.4s $0.1175 44.8K 70 2
artsy-gql graphql None t1 completed 0.448 0.500 0.333 0.483 60.3s $0.1480 24.8K 0 1
artsy-gql graphql None t2 completed 0.069 0.000 0.143 0.257 46.1s $0.1201 23.7K 0 1
artsy-gql graphql Pretty t1 completed 0.567 0.500 0.667 0.667 158.9s $0.5158 360.6K 174.6K 16
artsy-gql graphql Pretty t2 completed 0.860 1.000 0.571 0.886 148.5s $0.6184 479.0K 174.6K 19
artsy-gql graphql Minified t1 completed 0.567 0.500 0.667 0.667 122.2s $0.3796 201.9K 86.8K 14
artsy-gql graphql Minified t2 completed 0.860 1.000 0.571 0.886 152.9s $0.7321 497.0K 86.8K 17
artsy-gql graphql LAP-Std t1 completed 0.560 0.500 0.667 0.600 176.4s $0.9636 848.6K 83.7K 18
artsy-gql graphql LAP-Std t2 completed 0.903 1.000 0.714 0.886 150.1s $0.5377 410.6K 83.7K 11
artsy-gql graphql LAP-Lean t1 completed 0.887 1.000 0.667 0.867 122.5s $0.4953 254.8K 76.2K 13
artsy-gql graphql LAP-Lean t2 completed 0.860 1.000 0.571 0.886 196.1s $0.9527 969.1K 76.2K 21
coral-gql graphql None t1 completed 0.082 0.000 0.182 0.273 49.1s $0.1264 23.9K 0 1
coral-gql graphql None t2 completed 0.020 0.000 0.000 0.200 50.8s $0.1253 23.9K 0 1
coral-gql graphql Pretty t1 completed 0.907 1.000 0.727 0.891 113.6s $0.3726 290.8K 59.2K 9
coral-gql graphql Pretty t2 completed 0.700 0.500 1.000 1.000 164.0s $0.6061 522.1K 59.2K 16
coral-gql graphql Minified t1 completed 1.000 1.000 1.000 1.000 130.5s $0.4793 303.0K 48.6K 17
coral-gql graphql Minified t2 completed 1.000 1.000 1.000 1.000 100.3s $0.2985 176.3K 48.6K 10
coral-gql graphql LAP-Std t1 completed 1.000 1.000 1.000 1.000 119.3s $0.3768 192.0K 16.8K 10
coral-gql graphql LAP-Std t2 completed 1.000 1.000 1.000 1.000 91.3s $0.2955 153.5K 16.8K 10
coral-gql graphql LAP-Lean t1 completed 1.000 1.000 1.000 1.000 151.4s $0.4666 311.5K 13.9K 16
coral-gql graphql LAP-Lean t2 completed 0.700 0.500 1.000 1.000 106.9s $0.4093 321.2K 13.9K 12
elastic-gql graphql None t1 completed 0.360 0.000 1.000 0.600 52.4s $0.1303 24.1K 0 1
elastic-gql graphql None t2 completed 0.360 0.000 1.000 0.600 56.0s $0.1335 24.2K 0 1
elastic-gql graphql Pretty t1 completed 0.700 0.500 1.000 1.000 98.0s $0.2844 127.5K 76.4K 7
elastic-gql graphql Pretty t2 completed 1.000 1.000 1.000 1.000 120.5s $0.4182 276.4K 76.4K 10
elastic-gql graphql Minified t1 completed 0.680 0.500 1.000 0.800 114.5s $0.3513 215.6K 64.0K 8
elastic-gql graphql Minified t2 completed 1.000 1.000 1.000 1.000 102.4s $0.3371 199.7K 64.0K 7
elastic-gql graphql LAP-Std t1 completed 0.700 0.500 1.000 1.000 100.1s $0.4414 369.9K 15.8K 11
elastic-gql graphql LAP-Std t2 completed 1.000 1.000 1.000 1.000 88.7s $0.3433 203.1K 15.8K 6
elastic-gql graphql LAP-Lean t1 completed 0.680 0.500 1.000 0.800 129.4s $0.4750 301.6K 15.7K 11
elastic-gql graphql LAP-Lean t2 completed 1.000 1.000 1.000 1.000 111.9s $0.4227 251.5K 15.7K 9
github-gql graphql None t1 completed 0.505 0.500 0.500 0.550 45.3s $0.1204 23.7K 0 1
github-gql graphql None t2 completed 0.753 1.000 0.286 0.671 50.3s $0.1312 24.2K 0 1
github-gql graphql Pretty t1 completed 0.915 1.000 0.750 0.900 134.7s $0.4809 406.3K 286.0K 18
github-gql graphql Pretty t2 completed 0.854 1.000 0.571 0.829 139.1s $0.5185 419.3K 286.0K 14
github-gql graphql Minified t1 completed 0.915 1.000 0.750 0.900 79.4s $0.2371 122.6K 233.8K 8
github-gql graphql Minified t2 completed 0.854 1.000 0.571 0.829 119.2s $0.3913 276.5K 233.8K 13
github-gql graphql LAP-Std t1 completed 0.915 1.000 0.750 0.900 103.2s $0.5088 290.8K 73.6K 12
github-gql graphql LAP-Std t2 completed 0.854 1.000 0.571 0.829 114.0s $0.5364 288.2K 73.6K 8
github-gql graphql LAP-Lean t1 completed 0.915 1.000 0.750 0.900 142.7s $0.6596 520.8K 50.4K 15
github-gql graphql LAP-Lean t2 completed 0.854 1.000 0.571 0.829 109.4s $0.5155 255.5K 50.4K 10
linear-gql graphql None t1 completed 1.000 1.000 1.000 1.000 49.5s $0.1271 24.0K 0 1
linear-gql graphql None t2 completed 0.133 0.000 0.333 0.333 47.4s $0.1205 23.7K 0 1
linear-gql graphql Pretty t1 completed 1.000 1.000 1.000 1.000 171.4s $0.6899 551.8K 205.9K 24
linear-gql graphql Pretty t2 completed 0.567 0.500 0.667 0.667 114.4s $0.4097 269.6K 205.9K 15
linear-gql graphql Minified t1 completed 1.000 1.000 1.000 1.000 155.9s $0.5555 362.9K 164.2K 18
linear-gql graphql Minified t2 completed 0.567 0.500 0.667 0.667 166.6s $0.7284 642.9K 164.2K 17
linear-gql graphql LAP-Std t1 completed 0.700 0.500 1.000 1.000 147.4s $0.5855 400.2K 57.0K 17
linear-gql graphql LAP-Std t2 completed 0.567 0.500 0.667 0.667 129.5s $0.5512 421.7K 57.0K 13
linear-gql graphql LAP-Lean t1 completed 0.700 0.500 1.000 1.000 121.1s $0.5000 275.6K 47.1K 14
linear-gql graphql LAP-Lean t2 completed 0.567 0.500 0.667 0.667 134.8s $0.5622 383.1K 47.1K 10
saleor-gql graphql None t1 completed 0.481 0.500 0.429 0.521 44.4s $0.1092 23.3K 0 1
saleor-gql graphql None t2 completed 0.105 0.000 0.250 0.300 42.1s $0.1139 23.4K 0 1
saleor-gql graphql Pretty t1 completed 0.951 1.000 0.857 0.943 133.8s $0.4124 228.3K 221.2K 11
saleor-gql graphql Pretty t2 completed 0.700 0.500 1.000 1.000 106.4s $0.3501 246.1K 221.2K 12
saleor-gql graphql Minified t1 completed 0.951 1.000 0.857 0.943 100.0s $0.3438 171.0K 188.6K 10
saleor-gql graphql Minified t2 completed 0.915 1.000 0.750 0.900 123.0s $0.4518 361.7K 188.6K 13
saleor-gql graphql LAP-Std t1 completed 0.704 0.750 0.571 0.829 173.2s $0.7166 611.0K 71.8K 17
saleor-gql graphql LAP-Std t2 completed 0.690 0.500 1.000 0.900 122.3s $0.4191 273.7K 71.8K 15
saleor-gql graphql LAP-Lean t1 completed 0.951 1.000 0.857 0.943 145.3s $0.4974 359.3K 50.8K 17
saleor-gql graphql LAP-Lean t2 completed 0.915 1.000 0.750 0.900 104.6s $0.3826 184.0K 50.8K 9
shopify-gql graphql None t1 completed 0.020 0.000 0.000 0.200 45.5s $0.1215 23.7K 0 1
shopify-gql graphql None t2 completed 0.020 0.000 0.000 0.200 43.2s $0.1153 23.5K 0 1
shopify-gql graphql Pretty t1 completed 1.000 1.000 1.000 1.000 109.6s $0.3335 200.6K 78.5K 9
shopify-gql graphql Pretty t2 completed 1.000 1.000 1.000 1.000 139.3s $0.5045 402.2K 78.5K 13
shopify-gql graphql Minified t1 completed 1.000 1.000 1.000 1.000 97.3s $0.2926 177.8K 50.7K 7
shopify-gql graphql Minified t2 completed 1.000 1.000 1.000 1.000 92.9s $0.2940 157.1K 50.7K 6
shopify-gql graphql LAP-Std t1 completed 1.000 1.000 1.000 1.000 56.4s $0.1810 65.4K 4.6K 2
shopify-gql graphql LAP-Std t2 completed 1.000 1.000 1.000 1.000 62.2s $0.1991 66.2K 4.6K 2
shopify-gql graphql LAP-Lean t1 completed 1.000 1.000 1.000 1.000 52.2s $0.1759 64.3K 3.8K 2
shopify-gql graphql LAP-Lean t2 completed 1.000 1.000 1.000 1.000 64.2s $0.1982 65.2K 3.8K 2
swapi-gql graphql None t1 completed 0.133 0.000 0.333 0.333 48.5s $0.1264 23.9K 0 1
swapi-gql graphql None t2 completed 0.392 0.500 0.167 0.417 46.3s $0.1193 23.7K 0 1
swapi-gql graphql Pretty t1 completed 0.830 1.000 0.500 0.800 58.1s $0.2363 59.9K 8.7K 2
swapi-gql graphql Pretty t2 completed 0.880 1.000 0.667 0.800 59.0s $0.2295 59.7K 8.7K 2
swapi-gql graphql Minified t1 completed 0.830 1.000 0.500 0.800 61.6s $0.2339 59.0K 7.7K 2
swapi-gql graphql Minified t2 completed 0.830 1.000 0.500 0.800 68.0s $0.2385 59.2K 7.7K 2
swapi-gql graphql LAP-Std t1 completed 0.830 1.000 0.500 0.800 54.3s $0.1627 48.5K 1.7K 2
swapi-gql graphql LAP-Std t2 completed 0.880 1.000 0.667 0.800 67.3s $0.1785 62.3K 1.7K 2
swapi-gql graphql LAP-Lean t1 completed 0.830 1.000 0.500 0.800 59.5s $0.1673 48.2K 1.3K 2
swapi-gql graphql LAP-Lean t2 completed 0.830 1.000 0.500 0.800 66.9s $0.1859 48.9K 1.3K 2
unraid-gql graphql None t1 completed 0.020 0.000 0.000 0.200 28.3s $0.0781 22.0K 0 1
unraid-gql graphql None t2 completed 0.020 0.000 0.000 0.200 37.9s $0.1028 23.0K 0 1
unraid-gql graphql Pretty t1 completed 1.000 1.000 1.000 1.000 95.6s $0.3112 180.9K 15.8K 11
unraid-gql graphql Pretty t2 completed 1.000 1.000 1.000 1.000 96.6s $0.3132 183.8K 15.8K 11
unraid-gql graphql Minified t1 completed 1.000 1.000 1.000 1.000 90.4s $0.2864 178.5K 13.1K 10
unraid-gql graphql Minified t2 completed 1.000 1.000 1.000 1.000 119.2s $0.4574 374.9K 13.1K 14
unraid-gql graphql LAP-Std t1 completed 1.000 1.000 1.000 1.000 63.2s $0.2263 57.5K 7.4K 2
unraid-gql graphql LAP-Std t2 completed 1.000 1.000 1.000 1.000 54.2s $0.2052 56.6K 7.4K 2
unraid-gql graphql LAP-Lean t1 completed 1.000 1.000 1.000 1.000 60.3s $0.2206 56.7K 6.8K 2
unraid-gql graphql LAP-Lean t2 completed 1.000 1.000 1.000 1.000 52.9s $0.2069 56.1K 6.8K 2
yelp-gql graphql None t1 completed 0.797 1.000 0.421 0.705 51.8s $0.1317 24.2K 0 1
yelp-gql graphql None t2 completed 0.177 0.000 0.462 0.385 54.0s $0.1357 24.3K 0 1
yelp-gql graphql Pretty t1 completed 0.911 1.000 0.737 0.895 112.8s $0.3797 226.6K 19.3K 10
yelp-gql graphql Pretty t2 completed 0.922 1.000 0.769 0.908 142.2s $0.4863 349.9K 19.3K 14
yelp-gql graphql Minified t1 completed 0.911 1.000 0.737 0.895 96.4s $0.2954 154.1K 12.9K 7
yelp-gql graphql Minified t2 completed 0.922 1.000 0.769 0.908 114.5s $0.4323 324.1K 12.9K 12
yelp-gql graphql LAP-Std t1 completed 0.946 1.000 0.842 0.937 55.1s $0.1580 60.3K 698 2
yelp-gql graphql LAP-Std t2 completed 0.922 1.000 0.769 0.908 64.7s $0.1584 60.4K 698 2
yelp-gql graphql LAP-Lean t1 completed 0.946 1.000 0.842 0.937 58.0s $0.1513 59.8K 604 2
yelp-gql graphql LAP-Lean t2 completed 0.922 1.000 0.769 0.908 62.9s $0.1650 60.3K 604 2
adobe-postman postman None t1 completed 0.895 1.000 0.750 0.700 62.4s $0.1291 24.0K 0 1
adobe-postman postman None t2 completed 0.078 0.000 0.182 0.236 80.0s $0.1645 25.4K 0 1
adobe-postman postman Pretty t1 completed 0.970 1.000 1.000 0.700 60.7s $0.2281 59.6K 9.7K 2
adobe-postman postman Pretty t2 completed 0.969 1.000 0.909 0.964 96.9s $0.2856 61.9K 9.7K 2
adobe-postman postman Minified t1 completed 0.970 1.000 1.000 0.700 63.4s $0.1989 53.8K 6.7K 2
adobe-postman postman Minified t2 completed 0.993 1.000 1.000 0.927 77.7s $0.2214 67.9K 6.7K 2
adobe-postman postman LAP-Std t1 completed 0.970 1.000 1.000 0.700 59.2s $0.1525 47.5K 1.5K 2
adobe-postman postman LAP-Std t2 completed 0.969 1.000 0.909 0.964 73.5s $0.1803 48.6K 1.5K 2
adobe-postman postman LAP-Lean t1 completed 0.980 1.000 1.000 0.800 56.8s $0.1469 46.4K 509 2
adobe-postman postman LAP-Lean t2 completed 0.989 1.000 1.000 0.891 79.5s $0.1844 47.9K 509 2
akeneo-postman postman None t1 completed 0.972 1.000 0.917 0.967 72.1s $0.1572 25.1K 0 1
akeneo-postman postman None t2 completed 0.360 0.000 1.000 0.600 52.0s $0.1217 23.7K 0 1
akeneo-postman postman Pretty t1 completed 1.000 1.000 1.000 1.000 124.0s $0.4342 174.8K 64.6K 15
akeneo-postman postman Pretty t2 completed 1.000 1.000 1.000 1.000 93.0s $0.2896 176.8K 64.6K 9
akeneo-postman postman Minified t1 completed 0.595 0.500 0.750 0.700 288.2s $0.5216 185.2K 46.3K 8
akeneo-postman postman Minified t2 completed 1.000 1.000 1.000 1.000 251.7s $0.3904 174.7K 46.3K 6
akeneo-postman postman LAP-Std t1 completed 1.000 1.000 1.000 1.000 100.6s $0.3199 62.7K 10.2K 2
akeneo-postman postman LAP-Std t2 completed 1.000 1.000 1.000 1.000 65.2s $0.2313 59.1K 10.2K 2
akeneo-postman postman LAP-Lean t1 completed 1.000 1.000 1.000 1.000 92.4s $0.2730 56.2K 5.1K 2
akeneo-postman postman LAP-Lean t2 completed 1.000 1.000 1.000 1.000 55.7s $0.1688 52.0K 5.1K 2
auth0-postman postman None t1 completed 0.538 0.500 0.556 0.711 68.8s $0.1494 24.8K 0 1
auth0-postman postman None t2 completed 0.436 0.500 0.273 0.545 64.9s $0.1432 24.6K 0 1
auth0-postman postman Pretty t1 completed 1.000 1.000 1.000 1.000 95.3s $0.3180 177.7K 22.1K 11
auth0-postman postman Pretty t2 completed 1.000 1.000 1.000 1.000 85.1s $0.2514 125.1K 22.1K 7
auth0-postman postman Minified t1 completed 1.000 1.000 1.000 1.000 130.3s $0.3954 311.6K 14.3K 14
auth0-postman postman Minified t2 completed 1.000 1.000 1.000 1.000 140.1s $0.3978 259.2K 14.3K 10
auth0-postman postman LAP-Std t1 completed 1.000 1.000 1.000 1.000 60.8s $0.1654 49.4K 2.5K 2
auth0-postman postman LAP-Std t2 completed 1.000 1.000 1.000 1.000 64.6s $0.1711 49.7K 2.5K 2
auth0-postman postman LAP-Lean t1 completed 1.000 1.000 1.000 1.000 64.4s $0.1612 48.7K 2.0K 2
auth0-postman postman LAP-Lean t2 completed 1.000 1.000 1.000 1.000 59.3s $0.1649 48.9K 2.0K 2
azure-devops-postman postman None t1 completed 0.069 0.000 0.143 0.257 74.9s $0.1545 25.0K 0 1
azure-devops-postman postman None t2 completed 0.088 0.000 0.200 0.280 53.1s $0.1238 23.8K 0 1
azure-devops-postman postman Pretty t1 completed 1.000 1.000 1.000 1.000 164.2s $0.4956 347.0K 106.6K 17
azure-devops-postman postman Pretty t2 completed 1.000 1.000 1.000 1.000 118.8s $0.3809 269.0K 106.6K 9
azure-devops-postman postman Minified t1 completed 1.000 1.000 1.000 1.000 157.4s $0.5257 480.6K 75.0K 15
azure-devops-postman postman Minified t2 completed 1.000 1.000 1.000 1.000 222.3s $0.4070 175.1K 75.0K 6
azure-devops-postman postman LAP-Std t1 completed 1.000 1.000 1.000 1.000 101.9s $0.2913 155.3K 12.8K 10
azure-devops-postman postman LAP-Std t2 completed 1.000 1.000 1.000 1.000 84.4s $0.2340 144.4K 12.8K 6
azure-devops-postman postman LAP-Lean t1 completed 1.000 1.000 1.000 1.000 67.2s $0.2296 57.7K 8.1K 2
azure-devops-postman postman LAP-Lean t2 completed 1.000 1.000 1.000 1.000 64.0s $0.2138 57.1K 8.1K 2
braintree-postman postman None t1 completed 0.020 0.000 0.000 0.200 60.5s $0.1430 24.5K 0 1
braintree-postman postman None t2 completed 0.020 0.000 0.000 0.200 53.9s $0.1159 23.5K 0 1
braintree-postman postman Pretty t1 completed 0.360 0.000 1.000 0.600 119.5s $0.3697 178.7K 34.6K 10
braintree-postman postman Pretty t2 completed 0.360 0.000 1.000 0.600 113.4s $0.3408 190.1K 34.6K 7
braintree-postman postman Minified t1 completed 0.360 0.000 1.000 0.600 238.6s $0.5547 150.4K 22.4K 8
braintree-postman postman Minified t2 completed 0.360 0.000 1.000 0.600 226.2s $0.4653 148.6K 22.4K 5
braintree-postman postman LAP-Std t1 completed 0.360 0.000 1.000 0.600 84.2s $0.1962 50.0K 1.8K 2
braintree-postman postman LAP-Std t2 completed 0.360 0.000 1.000 0.600 68.0s $0.1698 48.9K 1.8K 2
braintree-postman postman LAP-Lean t1 completed 0.020 0.000 0.000 0.200 62.8s $0.1641 96.2K 1.4K 4
braintree-postman postman LAP-Lean t2 completed 0.360 0.000 1.000 0.600 91.2s $0.2028 73.6K 1.4K 3
influxdb-postman postman None t1 completed 0.772 0.750 0.800 0.820 63.9s $0.1325 24.1K 0 1
influxdb-postman postman None t2 completed 0.951 1.000 0.857 0.943 49.4s $0.1095 23.2K 0 1
influxdb-postman postman Pretty t1 completed 1.000 1.000 1.000 1.000 139.7s $0.3907 240.5K 80.5K 14
influxdb-postman postman Pretty t2 completed 1.000 1.000 1.000 1.000 123.7s $0.3733 202.9K 80.5K 16
influxdb-postman postman Minified t1 completed 1.000 1.000 1.000 1.000 276.2s $0.9580 889.6K 64.0K 27
influxdb-postman postman Minified t2 completed 0.951 1.000 0.857 0.943 247.4s $0.4244 150.8K 64.0K 8
influxdb-postman postman LAP-Std t1 completed 1.000 1.000 1.000 1.000 76.6s $0.2121 55.1K 5.8K 2
influxdb-postman postman LAP-Std t2 completed 1.000 1.000 1.000 1.000 66.8s $0.2011 54.6K 5.8K 2
influxdb-postman postman LAP-Lean t1 completed 1.000 1.000 1.000 1.000 73.6s $0.2048 53.4K 4.5K 2
influxdb-postman postman LAP-Lean t2 completed 0.951 1.000 0.857 0.943 68.9s $0.2018 53.3K 4.5K 2
postman-echo postman None t1 completed 0.660 1.000 0.000 0.600 48.1s $0.1099 23.2K 0 1
postman-echo postman None t2 completed 0.700 1.000 0.000 1.000 55.4s $0.1222 23.7K 0 1
postman-echo postman Pretty t1 completed 1.000 1.000 1.000 1.000 49.8s $0.1549 49.6K 2.6K 2
postman-echo postman Pretty t2 completed 0.695 1.000 0.000 0.950 72.0s $0.1726 50.4K 2.6K 2
postman-echo postman Minified t1 completed 1.000 1.000 1.000 1.000 58.1s $0.1415 47.2K 1.7K 2
postman-echo postman Minified t2 completed 0.655 1.000 0.000 0.550 62.4s $0.1543 47.7K 1.7K 2
postman-echo postman LAP-Std t1 completed 0.995 1.000 1.000 0.950 56.6s $0.1258 45.3K 256 2
postman-echo postman LAP-Std t2 completed 1.000 1.000 1.000 1.000 50.1s $0.1295 45.4K 256 2
postman-echo postman LAP-Lean t1 completed 1.000 1.000 1.000 1.000 46.2s $0.1213 45.1K 193 2
postman-echo postman LAP-Lean t2 completed 1.000 1.000 1.000 1.000 62.1s $0.1407 45.9K 193 2
sap-postman postman None t1 completed 0.048 0.000 0.083 0.233 72.5s $0.1558 25.1K 0 1
sap-postman postman None t2 completed 0.088 0.000 0.200 0.280 75.0s $0.1658 25.4K 0 1
sap-postman postman Pretty t1 completed 0.218 0.000 0.583 0.433 101.8s $0.2793 127.5K 127.7K 5
sap-postman postman Pretty t2 completed 1.000 1.000 1.000 1.000 151.1s $0.5739 304.1K 127.7K 12
sap-postman postman Minified t1 completed 0.218 0.000 0.583 0.433 286.5s $0.5890 149.9K 89.2K 6
sap-postman postman Minified t2 completed 0.782 0.750 0.800 0.920 348.5s $0.6617 211.5K 89.2K 7
sap-postman postman LAP-Std t1 completed 1.000 1.000 1.000 1.000 75.9s $0.2721 62.9K 10.5K 2
sap-postman postman LAP-Std t2 completed 1.000 1.000 1.000 1.000 94.2s $0.3066 64.2K 10.5K 2
sap-postman postman LAP-Lean t1 completed 1.000 1.000 1.000 1.000 72.4s $0.2481 60.2K 8.9K 2
sap-postman postman LAP-Lean t2 completed 1.000 1.000 1.000 1.000 85.9s $0.2732 61.2K 8.9K 2
stripe-postman postman None t1 completed 1.000 1.000 1.000 1.000 51.8s $0.1203 23.6K 0 1
stripe-postman postman None t2 completed 0.776 0.750 0.857 0.693 59.2s $0.1238 23.8K 0 1
stripe-postman postman Pretty t1 completed 0.680 0.500 1.000 0.800 81.8s $0.2152 96.8K 52.0K 6
stripe-postman postman Pretty t2 completed 1.000 1.000 1.000 1.000 93.1s $0.2858 163.2K 52.0K 7
stripe-postman postman Minified t1 completed 1.000 1.000 1.000 1.000 296.6s $0.5955 210.4K 27.5K 8
stripe-postman postman Minified t2 completed 0.951 1.000 0.857 0.943 158.9s $0.4919 115.9K 27.5K 5
stripe-postman postman LAP-Std t1 completed 1.000 1.000 1.000 1.000 58.2s $0.2177 58.4K 8.5K 2
stripe-postman postman LAP-Std t2 completed 0.850 0.750 1.000 1.000 71.0s $0.2328 59.0K 8.5K 2
stripe-postman postman LAP-Lean t1 completed 1.000 1.000 1.000 1.000 66.1s $0.2082 56.3K 7.0K 2
stripe-postman postman LAP-Lean t2 completed 0.850 0.750 1.000 1.000 54.3s $0.2009 56.0K 7.0K 2
twilio-postman postman None t1 completed 0.105 0.000 0.250 0.300 59.0s $0.1211 23.7K 0 1
twilio-postman postman None t2 completed 0.020 0.000 0.000 0.200 70.8s $0.1640 25.4K 0 1
twilio-postman postman Pretty t1 completed 1.000 1.000 1.000 1.000 53.1s $0.1562 50.0K 3.3K 2
twilio-postman postman Pretty t2 completed 1.000 1.000 1.000 1.000 64.9s $0.1801 50.9K 3.3K 2
twilio-postman postman Minified t1 completed 1.000 1.000 1.000 1.000 63.6s $0.1613 48.5K 2.1K 2
twilio-postman postman Minified t2 completed 1.000 1.000 1.000 1.000 65.5s $0.1638 48.5K 2.1K 2
twilio-postman postman LAP-Std t1 completed 1.000 1.000 1.000 1.000 48.6s $0.1280 45.5K 308 2
twilio-postman postman LAP-Std t2 completed 1.000 1.000 1.000 1.000 54.9s $0.1360 45.8K 308 2
twilio-postman postman LAP-Lean t1 completed 1.000 1.000 1.000 1.000 50.7s $0.1313 45.6K 250 2
twilio-postman postman LAP-Lean t2 completed 1.000 1.000 1.000 1.000 48.7s $0.1281 45.4K 250 2
google-billing protobuf None t1 completed 0.190 0.000 0.500 0.400 51.4s $0.1153 30.3K 0 1
google-billing protobuf None t2 completed 0.260 0.000 0.667 0.600 39.4s $0.1025 29.7K 0 1
google-billing protobuf Pretty t1 completed 0.360 0.000 1.000 0.600 67.8s $0.1951 66.5K 4.9K 2
google-billing protobuf Pretty t2 completed 0.260 0.000 0.667 0.600 46.4s $0.1709 65.6K 4.9K 2
google-billing protobuf Minified t1 completed 0.360 0.000 1.000 0.600 60.7s $0.1702 62.2K 1.7K 2
google-billing protobuf Minified t2 completed 0.260 0.000 0.667 0.600 47.5s $0.1472 61.3K 1.7K 2
google-billing protobuf LAP-Std t1 completed 1.000 1.000 1.000 1.000 51.0s $0.1424 59.2K 380 2
google-billing protobuf LAP-Std t2 completed 0.360 0.000 1.000 0.600 38.7s $0.1168 58.1K 380 2
google-billing protobuf LAP-Lean t1 completed 0.990 1.000 1.000 0.900 53.7s $0.1372 59.0K 367 2
google-billing protobuf LAP-Lean t2 completed 0.990 1.000 1.000 0.900 42.9s $0.1197 58.2K 367 2
google-datacatalog protobuf None t1 completed 0.105 0.000 0.250 0.300 58.7s $0.1138 30.2K 0 1
google-datacatalog protobuf None t2 completed 0.990 1.000 1.000 0.900 48.2s $0.1114 30.1K 0 1
google-datacatalog protobuf Pretty t1 completed 0.948 1.000 0.875 0.850 172.4s $0.6973 705.7K 20.8K 20
google-datacatalog protobuf Pretty t2 completed 0.360 0.000 1.000 0.600 214.3s $0.9392 1.01M 20.8K 29
google-datacatalog protobuf Minified t1 completed 0.985 1.000 1.000 0.850 83.2s $0.2079 67.8K 4.6K 2
google-datacatalog protobuf Minified t2 completed 0.990 1.000 1.000 0.900 83.0s $0.2366 68.9K 4.6K 2
google-datacatalog protobuf LAP-Std t1 completed 0.990 1.000 1.000 0.900 60.0s $0.1592 61.4K 1.8K 2
google-datacatalog protobuf LAP-Std t2 completed 0.360 0.000 1.000 0.600 92.0s $0.2396 95.3K 1.8K 3
google-datacatalog protobuf LAP-Lean t1 completed 0.990 1.000 1.000 0.900 57.5s $0.1567 61.3K 1.7K 2
google-datacatalog protobuf LAP-Lean t2 completed 0.360 0.000 1.000 0.600 61.1s $0.1693 61.8K 1.7K 2
google-firestore protobuf None t1 completed 0.088 0.000 0.200 0.280 46.1s $0.1034 29.8K 0 1
google-firestore protobuf None t2 completed 0.425 0.500 0.250 0.500 42.4s $0.0996 29.6K 0 1
google-firestore protobuf Pretty t1 completed 0.842 1.000 0.600 0.620 166.0s $0.4569 468.6K 10.9K 14
google-firestore protobuf Pretty t2 completed 0.595 0.500 0.750 0.700 147.2s $0.4896 484.6K 10.9K 14
google-firestore protobuf Minified t1 completed 0.505 0.500 0.500 0.550 61.5s $0.1724 63.0K 2.2K 2
google-firestore protobuf Minified t2 completed 0.595 0.500 0.750 0.700 55.6s $0.1530 62.2K 2.2K 2
google-firestore protobuf LAP-Std t1 completed 0.890 1.000 0.700 0.800 59.8s $0.1479 60.1K 963 2
google-firestore protobuf LAP-Std t2 completed 0.595 0.500 0.750 0.700 41.8s $0.1225 59.0K 963 2
google-firestore protobuf LAP-Lean t1 completed 0.868 1.000 0.700 0.580 69.2s $0.1426 59.8K 897 2
google-firestore protobuf LAP-Lean t2 completed 0.595 0.500 0.750 0.700 55.9s $0.1513 60.1K 897 2
google-language protobuf None t1 completed 0.820 1.000 0.500 0.700 50.4s $0.1221 30.5K 0 1
google-language protobuf None t2 completed 0.020 0.000 0.000 0.200 52.4s $0.1250 30.6K 0 1
google-language protobuf Pretty t1 completed 0.360 0.000 1.000 0.600 61.1s $0.2304 72.4K 8.0K 2
google-language protobuf Pretty t2 completed 0.675 0.500 1.000 0.750 66.5s $0.2340 72.6K 8.0K 2
google-language protobuf Minified t1 completed 0.820 1.000 0.500 0.700 50.5s $0.1685 63.5K 2.7K 2
google-language protobuf Minified t2 completed 0.607 0.500 0.800 0.670 62.4s $0.1833 64.1K 2.7K 2
google-language protobuf LAP-Std t1 completed 0.360 0.000 1.000 0.600 55.8s $0.1520 60.3K 1.1K 2
google-language protobuf LAP-Std t2 completed 0.607 0.500 0.800 0.670 61.3s $0.1540 60.3K 1.1K 2
google-language protobuf LAP-Lean t1 completed 0.360 0.000 1.000 0.600 64.5s $0.1620 60.6K 1.1K 2
google-language protobuf LAP-Lean t2 completed 0.607 0.500 0.800 0.670 56.9s $0.1504 60.1K 1.1K 2
google-pubsub protobuf None t1 completed 0.088 0.000 0.200 0.280 43.7s $0.1112 30.1K 0 1
google-pubsub protobuf None t2 completed 0.335 0.500 0.000 0.350 46.5s $0.1090 30.0K 0 1
google-pubsub protobuf Pretty t1 completed 0.675 0.500 1.000 0.750 95.3s $0.3851 349.8K 25.2K 13
google-pubsub protobuf Pretty t2 completed 0.922 1.000 0.800 0.820 71.2s $0.3058 202.3K 25.2K 6
google-pubsub protobuf Minified t1 completed 0.525 0.250 1.000 0.750 82.8s $0.2124 66.6K 3.6K 2
google-pubsub protobuf Minified t2 completed 0.922 1.000 0.800 0.820 46.2s $0.1678 64.7K 3.6K 2
google-pubsub protobuf LAP-Std t1 completed 0.675 0.500 1.000 0.750 46.3s $0.1313 59.0K 545 2
google-pubsub protobuf LAP-Std t2 completed 0.932 1.000 0.800 0.920 46.4s $0.1331 59.1K 545 2
google-pubsub protobuf LAP-Lean t1 completed 0.990 1.000 1.000 0.900 51.3s $0.1332 59.0K 465 2
google-pubsub protobuf LAP-Lean t2 completed 1.000 1.000 1.000 1.000 50.8s $0.1259 58.7K 465 2
google-spanner protobuf None t1 completed 0.437 0.500 0.300 0.470 45.3s $0.0998 29.6K 0 1
google-spanner protobuf None t2 completed 0.171 0.000 0.444 0.378 56.7s $0.1176 30.3K 0 1
google-spanner protobuf Pretty t1 completed 0.501 0.500 0.500 0.510 107.0s $0.3868 361.9K 14.6K 11
google-spanner protobuf Pretty t2 completed 0.966 1.000 1.000 0.661 133.8s $0.5528 567.4K 14.6K 16
google-spanner protobuf Minified t1 completed 0.899 1.000 0.800 0.590 54.9s $0.1520 62.5K 2.5K 2
google-spanner protobuf Minified t2 completed 0.986 1.000 1.000 0.856 56.7s $0.1641 62.9K 2.5K 2
google-spanner protobuf LAP-Std t1 completed 0.604 0.500 0.800 0.640 58.5s $0.1377 59.5K 857 2
google-spanner protobuf LAP-Std t2 completed 0.996 1.000 1.000 0.956 64.4s $0.1491 60.0K 857 2
google-spanner protobuf LAP-Lean t1 completed 0.974 1.000 1.000 0.740 62.0s $0.1514 60.0K 829 2
google-spanner protobuf LAP-Lean t2 completed 1.000 1.000 1.000 1.000 60.2s $0.1438 59.7K 829 2
google-storage protobuf None t1 completed 0.077 0.000 0.167 0.267 61.4s $0.2071 24.2K 0 1
google-storage protobuf None t2 completed 0.105 0.000 0.250 0.300 49.6s $0.2365 30.2K 0 1
google-storage protobuf Pretty t1 completed 0.820 1.000 0.500 0.700 118.0s $0.9517 229.8K 32.4K 13
google-storage protobuf Pretty t2 completed 0.955 1.000 0.917 0.800 105.6s $0.3343 207.9K 32.4K 6
google-storage protobuf Minified t1 completed 0.820 1.000 0.500 0.700 62.9s $0.2973 59.3K 7.7K 2
google-storage protobuf Minified t2 completed 0.877 1.000 0.667 0.767 75.5s $0.2518 73.5K 7.7K 2
google-storage protobuf LAP-Std t1 completed 0.855 1.000 0.583 0.800 49.7s $0.1589 59.7K 1.3K 2
google-storage protobuf LAP-Std t2 completed 0.883 1.000 0.667 0.833 66.9s $0.1684 61.3K 1.3K 2
google-storage protobuf LAP-Lean t1 completed 0.858 1.000 0.583 0.833 67.2s $0.2805 60.6K 1.3K 2
google-storage protobuf LAP-Lean t2 completed 0.952 1.000 0.917 0.767 60.8s $0.1578 60.8K 1.3K 2
google-talent protobuf None t1 completed 0.166 0.000 0.429 0.371 50.4s $0.1304 30.9K 0 1
google-talent protobuf None t2 completed 0.026 0.000 0.000 0.257 44.2s $0.1112 30.1K 0 1
google-talent protobuf Pretty t1 completed 0.990 1.000 1.000 0.900 169.7s $0.6879 684.1K 10.9K 18
google-talent protobuf Pretty t2 completed 0.311 0.000 0.857 0.543 165.2s $0.6726 627.0K 10.9K 26
google-talent protobuf Minified t1 completed 0.990 1.000 1.000 0.900 55.8s $0.1603 61.5K 1.5K 2
google-talent protobuf Minified t2 completed 0.941 1.000 0.857 0.843 62.8s $0.1761 62.1K 1.5K 2
google-talent protobuf LAP-Std t1 completed 0.360 0.000 1.000 0.600 65.6s $0.1703 60.4K 547 2
google-talent protobuf LAP-Std t2 completed 0.951 1.000 0.857 0.943 79.1s $0.1518 59.8K 547 2
google-talent protobuf LAP-Lean t1 completed 0.632 0.500 0.857 0.750 76.3s $0.1899 61.2K 539 2
google-talent protobuf LAP-Lean t2 completed 0.941 1.000 0.857 0.843 65.8s $0.1579 60.0K 539 2
google-translate protobuf None t1 completed 0.020 0.000 0.000 0.200 36.2s $0.0958 29.5K 0 1
google-translate protobuf None t2 completed 0.020 0.000 0.000 0.200 51.1s $0.1249 30.6K 0 1
google-translate protobuf Pretty t1 completed 0.786 1.000 0.400 0.660 94.2s $0.2946 217.7K 18.5K 9
google-translate protobuf Pretty t2 completed 0.605 0.500 0.833 0.550 111.4s $0.2904 219.2K 18.5K 7
google-translate protobuf Minified t1 completed 0.360 0.000 1.000 0.600 58.0s $0.1799 65.0K 3.4K 2
google-translate protobuf Minified t2 completed 0.983 1.000 1.000 0.833 55.4s $0.1806 65.0K 3.4K 2
google-translate protobuf LAP-Std t1 completed 0.360 0.000 1.000 0.600 53.1s $0.1355 59.6K 883 2
google-translate protobuf LAP-Std t2 completed 0.927 1.000 0.833 0.767 60.5s $0.1582 60.5K 883 2
google-translate protobuf LAP-Lean t1 completed 0.360 0.000 1.000 0.600 54.7s $0.1480 60.1K 837 2
google-translate protobuf LAP-Lean t2 completed 0.618 0.500 0.833 0.683 53.7s $0.1430 59.9K 837 2
google-vision protobuf None t1 completed 0.247 0.000 0.667 0.467 54.9s $0.1342 31.0K 0 1
google-vision protobuf None t2 completed 0.566 0.500 0.714 0.521 59.7s $0.1206 30.5K 0 1
google-vision protobuf Pretty t1 completed 0.247 0.000 0.667 0.467 75.4s $0.2660 74.3K 8.9K 2
google-vision protobuf Pretty t2 completed 0.893 1.000 0.714 0.786 67.0s $0.2554 73.9K 8.9K 2
google-vision protobuf Minified t1 completed 0.247 0.000 0.667 0.467 82.8s $0.1974 64.0K 2.3K 2
google-vision protobuf Minified t2 completed 0.263 0.000 0.714 0.486 80.1s $0.1987 64.0K 2.3K 2
google-vision protobuf LAP-Std t1 completed 0.347 0.000 1.000 0.467 68.9s $0.1626 60.5K 1.0K 2
google-vision protobuf LAP-Std t2 completed 0.263 0.000 0.714 0.486 49.7s $0.1503 60.0K 1.0K 2
google-vision protobuf LAP-Lean t1 completed 0.360 0.000 1.000 0.600 66.9s $0.1684 60.8K 1.0K 2
google-vision protobuf LAP-Lean t2 completed 0.263 0.000 0.714 0.486 61.7s $0.1650 60.6K 1.0K 2

B Appendix B: Task Definitions

All 100 task definitions used in the benchmark (2 tasks per spec, 50 specs). Each task is phrased in business language to avoid endpoint-revealing technical terms.

Show all 100 task definitions (click to expand)
SpecFormatTaskDescriptionTarget Endpoints
box openapi t1 Write Python code that would look up the metadata and properties of a stored document, then see who has been granted shared access to that document along with their permission levels GET /files/{file_id}
GET /files/{file_id}/collaborations
box openapi t2 Write Python code that would retrieve the properties of a specific storage container, then browse through the files and sub-containers inside it with sorting and pagination controls GET /folders/{folder_id}
GET /folders/{folder_id}/items
digitalocean openapi t1 Write Python code that would provision a new cloud virtual machine in a specific region with a chosen size and OS image, including SSH access keys and tags, then check the current status and specs of that running instance POST /v2/droplets
GET /v2/droplets/{droplet_id}
digitalocean openapi t2 Write Python code that would register a new DNS entry for a hosted zone with a specific record type, hostname, value, and TTL, then enumerate all existing DNS entries configured under that zone POST /v2/domains/{domain_name}/records
GET /v2/domains/{domain_name}/records
figma openapi t1 Write Python code that would generate scaled raster renderings of selected design components from a project file in a chosen image format, then leave a review note anchored to a specific location on the canvas referencing those components GET /v1/images/{file_key}
POST /v1/files/{file_key}/comments
figma openapi t2 Write Python code that would set up an automated listener that notifies your server whenever design files change within a team workspace, then review the organization's audit trail of recent actions over a specified time window POST /v2/webhooks
GET /v1/activity_logs
github-rest openapi t1 Write Python code that would review all open bug reports and feature requests for a project sorted by recent activity, then file a new work item with a title, description, tags, and assigned team members GET /repos/{owner}/{repo}/issues
POST /repos/{owner}/{repo}/issues
github-rest openapi t2 Write Python code that would look up the metadata and configuration of a specific code project, then browse all the pending code change proposals for that project sorted and paginated GET /repos/{owner}/{repo}
GET /repos/{owner}/{repo}/pulls
plaid openapi t1 Write Python code that would obtain all the bank accounts connected through a user's financial link, then pull their spending and deposit history over a specified date range POST /accounts/get
POST /transactions/get
plaid openapi t2 Write Python code that would generate a temporary session credential for the bank connection onboarding flow specifying the user's locale and desired financial products, then browse the catalog of supported banks and credit unions by country POST /link/token/create
POST /institutions/get
resend openapi t1 Write Python code that would dispatch a notification to a recipient with a subject line, HTML body, and reply-to address, then look up the delivery status and metadata of that specific message POST /emails
GET /emails/{email_id}
resend openapi t2 Write Python code that would register a new custom sender identity for outbound communications in a preferred geographic region, then trigger a DNS validation check to confirm that identity is properly configured POST /domains
POST /domains/{domain_id}/verify
slack openapi t1 Write Python code that would send a rich-text announcement to a team discussion space, optionally as a threaded reply, then pull up the recent message timeline for that same space within a specific date window POST /chat.postMessage
GET /conversations.history
slack openapi t2 Write Python code that would set up a new dedicated discussion space for the team with a chosen visibility level, then browse all the existing discussion spaces filtering out any that have been archived POST /conversations.create
GET /conversations.list
spotify openapi t1 Write Python code that would look up the full metadata of a music release including artwork and label info for a given market, then browse through the individual songs on that release with pagination GET /albums/{id}
GET /albums/{id}/tracks
spotify openapi t2 Write Python code that would pull up the biography and popularity metrics for a specific musician, then find out which of their songs are currently the most streamed in a given country GET /artists/{id}
GET /artists/{id}/top-tracks
stripe openapi t1 Write Python code that would onboard a new entity into the billing system with their contact details and address, then pull up all the payment transactions associated with that entity within a specific time window POST /v1/customers
GET /v1/charges
stripe openapi t2 Write Python code that would add a new item to the merchandise catalog with a name, description, and availability status, then find catalog items that match specific search terms POST /v1/products
GET /v1/products/search
twilio openapi t1 Write Python code that would dispatch a text notification to a phone number from a specific sender with a callback for delivery status, then look up the delivery details and status of that specific notification POST /2010-04-01/Accounts/{AccountSid}/Messages.json
GET /2010-04-01/Accounts/{AccountSid}/Messages/{Sid}.json
twilio openapi t2 Write Python code that would initiate an outbound voice connection to a recipient from a designated number with a handler URL, then access all the audio recordings captured during that voice session POST /2010-04-01/Accounts/{AccountSid}/Calls.json
GET /2010-04-01/Accounts/{AccountSid}/Calls/{CallSid}/Recordings.json
adeo-kafka asyncapi t1 Write Python code that would submit a pricing calculation request with the necessary requester credentials and routing metadata, then listen for the computed pricing result to come back asynchronously PUBLISH adeo-{env}-case-study-COSTING-REQUEST-{version}
SUBSCRIBE costingResponseChannel
adeo-kafka asyncapi t2 Write Python code that would act as the pricing engine: pick up an incoming calculation request, process it, and return the result along with processing timestamps back to the original requester SUBSCRIBE adeo-{env}-case-study-COSTING-REQUEST-{version}
PUBLISH costingResponseChannel
correlation-id asyncapi t1 Write Python code that would listen for real-time brightness sensor data from a specific outdoor lighting device, then issue a remote command to adjust that device's output level, ensuring the request and response can be matched together SUBSCRIBE smartylighting/streetlights/1/0/event/{streetlightId}/lighting/measured
PUBLISH smartylighting/streetlights/1/0/action/{streetlightId}/dim
correlation-id asyncapi t2 Write Python code that would determine how the messaging broker connection is configured including its variable port and security settings, and explain the mechanism used to pair outgoing commands with their corresponding incoming sensor reports for a given device SUBSCRIBE smartylighting/streetlights/1/0/event/{streetlightId}/lighting/measured
PUBLISH smartylighting/streetlights/1/0/action/{streetlightId}/dim
gitter-streaming asyncapi t1 Write Python code that would open a persistent streaming connection to a chat room and extract the author details and message body from each incoming real-time update, while also handling connection keep-alive signals PUBLISH /rooms/{roomId}/{resource} chatMessage
PUBLISH /rooms/{roomId}/{resource} heartbeat
gitter-streaming asyncapi t2 Write Python code that would from the same chat room stream, extract read receipts, tagged user references, and linked issue references from each conversation item, while monitoring the periodic keep-alive signals PUBLISH /rooms/{roomId}/{resource} chatMessage
PUBLISH /rooms/{roomId}/{resource} heartbeat
kraken-websocket asyncapi t1 Write Python code that would verify the exchange connection is alive using a health check, then register to receive live market data for one or more specific currency pairs SUBSCRIBE / ping
SUBSCRIBE / subscribe
kraken-websocket asyncapi t2 Write Python code that would handle incoming platform-level health status broadcasts and process confirmation responses that indicate whether your data feed registrations were accepted, including the assigned feed identifiers PUBLISH / systemStatus
PUBLISH / subscriptionStatus
operation-security asyncapi t1 Write Python code that would deliver a notification that a user's access permissions have been revoked, using the appropriate authenticated delivery mechanism, and include the required event metadata and signature for secure delivery PUBLISH AUTHORIZATION_REVOCATION message
SECURITY oauth2 sendAuthRevoke
operation-security asyncapi t2 Write Python code that would parse an incoming access revocation event to extract the affected user's identity, the reason for revocation, the date it occurred, and the delivery tracking metadata such as attempt counts and timestamps PUBLISH AUTHORIZATION_REVOCATION message
SCHEMA Notification
rpc-server asyncapi t1 Write Python code that would act as the server side of a remote computation service: accept an incoming request containing a list of numbers to add up, perform the calculation, and return the result to whichever client sent the request SUBSCRIBE rpc_queue sum
PUBLISH {queue} sendSumResult
rpc-server asyncapi t2 Write Python code that would act as the client: send a list of numbers to a remote computation service for summation, then wait for and consume the calculated result that arrives on a dynamically assigned response destination PUBLISH rpc_queue sum
SUBSCRIBE {queue} sendSumResult
slack-rtm asyncapi t1 Write Python code that would post a text message to a team conversation and set up a real-time listener to capture new messages as they arrive from other users PUBLISH / outgoingMessage
SUBSCRIBE / message
slack-rtm asyncapi t2 Write Python code that would monitor for when a teammate is added to a conversation and detect when a brand-new conversation space is set up in the workspace SUBSCRIBE / memberJoinedChannel
SUBSCRIBE / channelCreated
social-media asyncapi t1 Write Python code that would capture a user's reaction on a piece of content from the client-side connection, then broadcast an engagement notification to all backend services so they can update their records SUBSCRIBE like/comment
PUBLISH comment/liked
social-media asyncapi t2 Write Python code that would listen for backend notifications that a content item's engagement data has been modified, then push the refreshed engagement count to the user's browser in real time SUBSCRIBE comment/{commentId}/changed
PUBLISH update/comment/likes
streetlights asyncapi t1 Write Python code that would set up a listener to receive real-time brightness sensor readings from a specific outdoor lighting unit, then send a remote command to switch that unit on SUBSCRIBE smartylighting.streetlights.1.0.event.{streetlightId}.lighting.measured
PUBLISH smartylighting.streetlights.1.0.action.{streetlightId}.turn.on
streetlights asyncapi t2 Write Python code that would remotely shut down one outdoor lighting unit, and then adjust another unit to a partial brightness percentage PUBLISH smartylighting.streetlights.1.0.action.{streetlightId}.turn.off
PUBLISH smartylighting.streetlights.1.0.action.{streetlightId}.dim
websocket-gemini asyncapi t1 Write Python code that would connect to a live feed for a specific cryptocurrency trading pair and parse each incoming update to extract trade executions and order book depth changes PUBLISH /v1/marketdata/{symbol} marketData
SUBSCRIBE /v1/marketdata/{symbol} marketData
websocket-gemini asyncapi t2 Write Python code that would configure the live data feed to only deliver specific event categories such as top-of-book snapshots, bid/offer levels, trade fills, and auction data, while also toggling the connection health check PUBLISH /v1/marketdata/{symbol} marketData
SUBSCRIBE /v1/marketdata/{symbol} marketData
artsy-gql graphql t1 Write Python code that would pull up a particular creator's profile from the art platform, then browse their past sales history at auction houses with pagination QUERY artist
QUERY auctionResultsByArtistsConnection
artsy-gql graphql t2 Write Python code that would browse available pieces in the gallery using filters like price range, availability, and creator, then place a purchase order for a chosen piece QUERY artworksConnection
MUTATION commerceCreateOrderWithArtwork
coral-gql graphql t1 Write Python code that would browse reader discussions across published articles with status and keyword filters, and see available published articles on the platform with pagination QUERY comments
QUERY stories
coral-gql graphql t2 Write Python code that would post a new reader response on a published article, and report an inappropriate reader response so moderators can review it MUTATION createComment
MUTATION createCommentFlag
elastic-gql graphql t1 Write Python code that would establish connections to both a current-generation and a previous-generation search cluster to run queries across two different platform versions simultaneously QUERY elastic77
QUERY elastic68
elastic-gql graphql t2 Write Python code that would access a legacy search cluster for historical data migration alongside a modern cluster for current production data, both requiring host configuration QUERY elastic56
QUERY elastic77
github-gql graphql t1 Write Python code that would find a specific open-source project by its maintainer and project name, then file a bug report against it with a title and description QUERY repository
MUTATION createIssue
github-gql graphql t2 Write Python code that would find developers or projects matching a keyword across the platform, and pull up a particular developer's public profile using their username QUERY search
QUERY user
linear-gql graphql t1 Write Python code that would look up the details of a particular work item in the tracker, then modify its properties such as status, assignee, or priority QUERY issue
MUTATION issueUpdate
linear-gql graphql t2 Write Python code that would view the details of an existing initiative in the tracker, then spin up a brand new initiative with an optional linked team chat channel for notifications QUERY project
MUTATION projectCreate
saleor-gql graphql t1 Write Python code that would look up the details of a specific item in the store catalog using its identifier or URL-friendly name, and pull up a customer's past purchase by its reference number QUERY product
QUERY order
saleor-gql graphql t2 Write Python code that would add a brand new item to the store catalog with its details, and modify an existing fulfillment center's configuration such as address or capacity MUTATION productCreate
MUTATION updateWarehouse
shopify-gql graphql t1 Write Python code that would register a new shopper account on the storefront and then sign them in to obtain a session credential for authenticated operations MUTATION customerCreate
MUTATION customerAccessTokenCreate
shopify-gql graphql t2 Write Python code that would retrieve the currently signed-in shopper's profile details using their session credential, and start a new shopping cart with selected products ready for purchase QUERY customer
MUTATION checkoutCreate
swapi-gql graphql t1 Write Python code that would get a paginated catalog of all movies in the franchise, and retrieve the biographical details of a specific character QUERY allFilms
QUERY person
swapi-gql graphql t2 Write Python code that would browse the full directory of worlds in the franchise with pagination support, and pull up the technical specs of a particular spacecraft QUERY allPlanets
QUERY starship
unraid-gql graphql t1 Write Python code that would pull up the details of a particular access credential on the server, and read a section of the system diagnostic output from a given file location QUERY apiKey
QUERY logFile
unraid-gql graphql t2 Write Python code that would send a new system alert to the server's notification feed, and change the display name of an existing container group on the server MUTATION createNotification
MUTATION renameDockerFolder
yelp-gql graphql t1 Write Python code that would discover nearby restaurants or shops by area and keyword with sorting and filtering options, then read what customers are saying about a particular venue QUERY search
QUERY reviews
yelp-gql graphql t2 Write Python code that would pull up the full profile of a specific venue using its platform identifier, and find upcoming happenings in a given area with date and category filters QUERY business
QUERY event_search
adobe-postman postman t1 Write Python code that would start a new data ingestion job on the experience platform, then attach a file payload to a specific dataset within that job so the data can be processed POST /data/foundation/import/batches
PUT /data/foundation/import/batches/:batchId/datasets/:datasetId/files/:filePath
adobe-postman postman t2 Write Python code that would push real-time event messages through an established streaming pipeline into the platform, then sample a portion of the ingested records to verify the data landed correctly and inspect its format POST /data/foundation/import/collection/:CONNECTION_ID
GET /data/foundation/import/batches/:batchId/datasets/:datasetId/preview
akeneo-postman postman t1 Write Python code that would define a new product classification group in the PIM with its required attributes, label translations, and image settings, then add a product to that group with category assignments, attribute values, and cross-sell associations POST /api/rest/v1/families
POST /api/rest/v1/products
akeneo-postman postman t2 Write Python code that would provision a new product data feed for a connected application, then browse all existing data feeds to confirm the new one appears in the list POST /api/rest/v1/catalogs
GET /api/rest/v1/catalogs
auth0-postman postman t1 Write Python code that would set up a new identity provider link in the tenant specifying the authentication strategy and provider options, then add a custom login pipeline hook with executable logic to run during authentication POST /api/v2/connections
POST /api/v2/rules
auth0-postman postman t2 Write Python code that would onboard a new user account with their credentials, contact info, and profile metadata through a specific identity provider, then authorize an application to access a protected resource by defining its allowed permission scopes POST /api/v2/users
POST /api/v2/client-grants
azure-devops-postman postman t1 Write Python code that would set up a new private package repository with upstream source access and badge visibility settings, then browse the CI/CD pipeline runs for a project to review recent build activity POST /{organization}/_apis/packaging/feeds
GET /{organization}/{project}/_apis/build/builds
azure-devops-postman postman t2 Write Python code that would trigger a new CI pipeline run for a project from a specific source branch with a given reason, then look up the full configuration of a particular pipeline definition to inspect its settings POST /{organization}/{project}/_apis/build/builds
GET /{organization}/{project}/_apis/build/definitions/{definitionId}
braintree-postman postman t1 Write Python code that would register a new buyer profile in the payment gateway, then process a payment against a saved payment method including shipping address and fraud risk assessment details POST /graphql [CreateCustomer]
POST /graphql [ChargePaymentMethod]
braintree-postman postman t2 Write Python code that would place a hold on funds from a buyer's payment method to reserve the transaction amount, then finalize the settlement to move the held funds and complete the purchase POST /graphql [AuthorizePaymentMethod]
POST /graphql [CaptureTransaction]
influxdb-postman postman t1 Write Python code that would ingest sensor or metric measurements into a time series storage container with a specified precision, then run an analytical query against the stored data to validate correctness and inspect the query plan POST /api/v2/write
POST /api/v2/query/analyze
influxdb-postman postman t2 Write Python code that would bootstrap a fresh time series database instance by provisioning the first administrator account, organization, storage container, and access credentials, then enumerate all available storage containers to confirm the setup POST /api/v2/setup
GET /api/v2/buckets
postman-echo postman t1 Write Python code that would submit form data to the diagnostic mirror service and confirm the response echoes it back correctly, then verify the service is healthy by checking for a successful status POST /post
GET /status/200
postman-echo postman t2 Write Python code that would update a resource through the diagnostic service to confirm write-and-reflect behavior, then retrieve a nested document path to verify routing and header forwarding PUT /status/201
GET /path/to/document
sap-postman postman t1 Write Python code that would register a new prospective business opportunity in the CRM with qualification level, classification, source info, and responsible parties, then look it up with full stakeholder details to confirm it was saved correctly POST /sap/byd/odata/cust/v1/khlead/LeadCollection
GET /sap/byd/odata/cust/v1/khlead/LeadCollection
sap-postman postman t2 Write Python code that would place a new commercial order for a buyer including service line items and pricing terms, then pull up the order back with full item breakdown and price calculations to verify accuracy POST /sap/byd/odata/cust/v1/khsalesorder/SalesOrderCollection
GET /sap/byd/odata/cust/v1/khsalesorder/SalesOrderCollection
stripe-postman postman t1 Write Python code that would process a one-time payment for a given amount and currency using a payment source, then register a new customer profile in the billing system with their contact information and default payment method POST /v1/charges
POST /v1/customers
stripe-postman postman t2 Write Python code that would initiate a new payment flow specifying how and when funds should be captured and confirmed from a customer's payment method, then issue a refund against a previously completed transaction POST /v1/payment_intents
POST /v1/refunds
twilio-postman postman t1 Write Python code that would dispatch an over-the-air instruction to a connected IoT device in binary mode, then look up the delivery receipt for that instruction to confirm it was received POST /v1/Commands
GET /v1/Commands/{Command_Sid}
twilio-postman postman t2 Write Python code that would pull the data consumption history for a particular wireless device, then check the pricing and allowance details of the billing plan it is enrolled in GET /v1/Sims/{SimSid}/UsageRecords
GET /v1/RatePlans/{RatePlanSid}
google-billing protobuf t1 Write Python code that would set up a new child payment account under an existing billing arrangement, then retrieve the full list of payment accounts accessible to the current user with filtering and pagination RPC CreateBillingAccount
RPC ListBillingAccounts
google-billing protobuf t2 Write Python code that would look up which payment account a specific cloud project is currently charged to, then change the payment account association so the project bills to a different account RPC GetProjectBillingInfo
RPC UpdateProjectBillingInfo
google-datacatalog protobuf t1 Write Python code that would find data assets across the organization by searching with keywords and scope filters, then set up a new logical grouping to organize the discovered data resources RPC SearchCatalog
RPC CreateEntryGroup
google-datacatalog protobuf t2 Write Python code that would define a reusable metadata schema with custom fields for annotating data assets, then apply an instance of that schema as a metadata annotation on a specific cataloged resource RPC CreateTagTemplate
RPC CreateTag
google-firestore protobuf t1 Write Python code that would search a document database using filter criteria and ordering rules, then obtain several documents at once by their paths in a single bulk retrieval operation RPC RunQuery
RPC BatchGetDocuments
google-firestore protobuf t2 Write Python code that would add a new record to a specific collection in the document database with a given identifier, then persist a set of write operations atomically as part of a transaction RPC CreateDocument
RPC Commit
google-language protobuf t1 Write Python code that would determine the overall emotional tone and opinion expressed in a block of text, then identify and extract the people, places, organizations, and other proper nouns mentioned in it RPC AnalyzeSentiment
RPC AnalyzeEntities
google-language protobuf t2 Write Python code that would automatically sort a piece of content into predefined topic categories, then run a comprehensive linguistic analysis that combines multiple insights like syntax, entities, and sentiment in a single pass RPC ClassifyText
RPC AnnotateText
google-pubsub protobuf t1 Write Python code that would send a batch of messages to a messaging channel, then retrieve a paginated list of all messaging channels available in the project RPC Publish
RPC ListTopics
google-pubsub protobuf t2 Write Python code that would consume pending messages from a message queue and then confirm that each message has been successfully processed so it is not redelivered RPC Pull
RPC Acknowledge
google-spanner protobuf t1 Write Python code that would run a database query using standard SQL within an active database connection, then finalize and persist the resulting changes as a completed transaction RPC ExecuteSql
RPC Commit
google-spanner protobuf t2 Write Python code that would establish a new connection to a cloud database instance, then retrieve specific rows from a table by looking up primary keys with optional column and index selection RPC CreateSession
RPC Read
google-storage protobuf t1 Write Python code that would provision a new container for cloud file storage within a given project, then browse the files already stored in that container with pagination support RPC CreateBucket
RPC ListObjects
google-storage protobuf t2 Write Python code that would access the details and content of a particular file stored in cloud storage, then modify its metadata properties such as access controls and custom attributes RPC GetObject
RPC UpdateObject
google-talent protobuf t1 Write Python code that would publish a new job listing for a company within the recruitment platform, then browse all available job listings with optional filtering criteria and pagination RPC CreateJob
RPC ListJobs
google-talent protobuf t2 Write Python code that would find open positions that match a candidate's skills, location, and preferences using relevance-based ranking, then look up the full details of a specific position by its unique identifier RPC SearchJobs
RPC GetJob
google-translate protobuf t1 Write Python code that would convert a piece of text from one language to another specifying the desired output language, then identify what language an unknown text passage is written in RPC TranslateText
RPC DetectLanguage
google-translate protobuf t2 Write Python code that would set up a custom terminology dictionary to ensure domain-specific terms are translated consistently, then retrieve all such custom dictionaries configured in the project RPC CreateGlossary
RPC ListGlossaries
google-vision protobuf t1 Write Python code that would run image recognition on a set of photos to detect labels, text, and objects, then perform the same analysis on document files like PDFs in a single synchronous request RPC BatchAnnotateImages
RPC BatchAnnotateFiles
google-vision protobuf t2 Write Python code that would kick off a long-running image analysis job that writes results to a cloud storage destination, and separately start an asynchronous document analysis job for large-scale processing of multi-page files RPC AsyncBatchAnnotateImages
RPC AsyncBatchAnnotateFiles