Rate limits

Rate limits define how often you can interact with our services over a certain period. They serve as guardrails that ensure WriftAI remains stable, accessible, and secure.

Why do we use rate limits?

Rate limits are standard in modern APIs, and we rely on them for several important reasons:

Protection against misuse: Without limits, someone could intentionally or accidentally send an overwhelming number of requests, potentially degrading service or causing outages. Rate limiting helps prevent those kinds of attacks or spikes in traffic.
Fair usage for all: By capping how many requests a single user or application can make, we prevent any one party from consuming disproportionate resources. This ensures that the system remains responsive and reliable for everyone.
System stability and performance: Traffic patterns can change quickly. Rate limits help us keep our infrastructure running smoothly, even during periods of heavy demand, by preventing unexpected overloads.

Limits by service

WriftAI rate limits are defined per service and per authentication state.

Need higher limits?

For higher or custom rate limits, contact support with your expected RPS, endpoints, and traffic pattern.

API

Base request limits

Client type	Limit	Bursting
Unauthenticated	1 request / second	Short bursts allowed
Authenticated	3 requests / second	Short bursts allowed

High-throughput endpoint

Endpoint	Limit	Bursting
Prediction creation	15 requests / second	Short bursts allowed

Model Registry

Operation	Limit
Model pulls	30 pulls / hour

Rate limit headers

Rate limit headers are returned on responses so you can track usage and implement safe retries.

Header	Meaning
`x-ratelimit-limit`	The maximum allowed requests in the current window
`x-ratelimit-remaining`	The remaining requests in the current window
`x-ratelimit-reset`	When the current window resets

`x-ratelimit-reset` format

The x-ratelimit-reset value differs by service:

Service	Reset value
API	Unix timestamp (seconds since epoch)
Other services (e.g., Model Registry)	Seconds until reset

Handling rate limit errors

If you exceed a limit, the service responds with HTTP 429 (Too Many Requests). When this happens:

Use x-ratelimit-reset to determine when to retry.
Prefer exponential backoff for repeated retries.
Avoid sustained bursts even if short spikes are allowed.

Best practices

Authenticate requests to get higher default API throughput.
Spread traffic evenly where possible instead of sending spikes.
Batch work when supported rather than sending many small requests.

Rate limits

On this page