Participation

Submission Guide

A complete walkthrough from applying for access to seeing your model on the leaderboard. The whole process typically takes less than a week, with most of the time spent on the 1–3 business day review.

Prerequisites

→ An organization doing legitimate veterinary AI development or research
→ A model with a REST API endpoint (HTTPS), or a containerized inference adapter
→ Agreement to our Data Access Policy and Acceptable Use Policy

End-to-End Process

Submit an access request

Go to the Request Access form on the homepage or call POST /api/applications directly. You'll need:

→ Your name and work email
→ Your organization name and type (vendor, academic, etc.)
→ A brief description of the AI tool you want to evaluate and why

You'll receive an immediate confirmation email. Our team reviews within 1–3 business days.

Review and approval

Our team reviews all requests for legitimacy, appropriate use case, and alignment with benchmark governance. We may contact you with follow-up questions. If approved, you'll receive an email with a link to sign the Participant Agreement.

Note: Approval is not automatic. We accept commercial vendors, academic institutions, and vetted internal QA teams. We do not approve applications that appear to have an intent to circumvent benchmark integrity.

Sign the Participant Agreement

Before receiving API credentials, all participants must sign our Data Access Agreement and Acceptable Use Policy. This is a legal and ethical requirement — violations result in immediate access revocation.

Create an API key

Once approved, log in to your dashboard and create an API key with the benchmark:run scope:

curl -X POST https://animl.health/api/api-keys \
  -H "Authorization: Bearer vault_sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "CI benchmark key",
    "scopes": ["benchmark:run", "benchmark:read", "model:write"]
  }'

Note: The raw key is returned once. Store it in a secret manager (AWS Secrets Manager, GitHub Actions secret, etc.) immediately.

Register your model

curl -X POST https://animl.health/api/models \
  -H "Authorization: Bearer vault_sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Acme Vet Summarizer v2",
    "endpointType": "REST_API",
    "endpointUrl": "https://api.acmevetai.com/summarize",
    "authHeaderName": "X-API-Key",
    "authHeaderValue": "my-model-api-key"
  }'

Your endpoint must accept POST with a JSON case input and return a JSON summary. See the API Docs for the exact contract.

Trigger a benchmark run

curl -X POST https://animl.health/api/benchmark-runs \
  -H "Authorization: Bearer vault_sk_live_..." \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: acme-v2-run-001" \
  -d '{
    "modelId": "clxyz...",
    "benchmarkSuiteId": "clsuite_clinical_summarization_v1_3",
    "label": "v2.1 release candidate",
    "notifyWebhook": "https://your-server.com/hooks/vault"
  }'

The run is queued immediately. VAULT will call your endpoint once per case, in random order, within a sandboxed environment. Your model never receives the case identifiers.

Poll for results or wait for webhook

# Poll every 60 seconds
curl https://animl.health/api/benchmark-runs/<runId> \
  -H "Authorization: Bearer vault_sk_live_..."

# Or configure a webhook (step 5) to be notified automatically

When status reaches COMPLETE, your scores are available under metrics. A detailed report is generated and accessible from your dashboard.

Typical run duration for the full 5,000-case suite depends on your model's latency. Estimate: cases × latency, with parallelism applied at VAULT's discretion up to your endpoint's concurrency limits.

Request leaderboard publication

Your results are private by default. If you want to appear on the public leaderboard:

→ Consent to publication from your dashboard
→ Our team reviews the submission (typically within 2 business days)
→ Once approved, your entry is added to the leaderboard

Note: Only your model name, organization, scores, and benchmark date are published. Raw outputs are never published.

Common Questions

What if my model times out on some cases?

Each case has a 30-second timeout. Timed-out cases are scored as failures and reduce your composite score. Ensure your endpoint can reliably respond within 30 seconds under load.

Can I re-run with a newer model version?

Yes — register a new model (or update an existing one) and trigger a new run. Each run is independently scored. You can have multiple runs in your history.

What if my endpoint needs to warm up?

VAULT doesn't send a pre-warm request before the benchmark. Ensure your service is already warmed and ready before triggering a run. Cold-start latency counts against your median latency score.

Is there a rate limit on benchmark runs?

Beta participants can run up to 3 benchmark runs per calendar month. Contact us if you need more runs for iterative development.

Ready to benchmark?

Submit your access request — approval typically takes 1–3 business days.

Request Access →API Reference