Rate limits
Understand WriftAI rate limits and restrictions.
Rate limits define how often you can interact with our services over a certain period. They serve as guardrails that ensure WriftAI remains stable, accessible, and secure.
Why do we use rate limits?
Rate limits are standard in modern APIs, and we rely on them for several important reasons:
- Protection against misuse: Without limits, someone could intentionally or accidentally send an overwhelming number of requests, potentially degrading service or causing outages. Rate limiting helps prevent those kinds of attacks or spikes in traffic.
- Fair usage for all: By capping how many requests a single user or application can make, we prevent any one party from consuming disproportionate resources. This ensures that the system remains responsive and reliable for everyone.
- System stability and performance: Traffic patterns can change quickly. Rate limits help us keep our infrastructure running smoothly, even during periods of heavy demand, by preventing unexpected overloads.
Limits by service
WriftAI rate limits are defined per service and per authentication state.
Need higher limits?
For higher or custom rate limits, contact support with your expected RPS, endpoints, and traffic pattern.
API
Base request limits
| Client type | Limit | Bursting |
|---|---|---|
| Unauthenticated | 1 request / second | Short bursts allowed |
| Authenticated | 3 requests / second | Short bursts allowed |
High-throughput endpoint
| Endpoint | Limit | Bursting |
|---|---|---|
| Prediction creation | 15 requests / second | Short bursts allowed |
Model Registry
| Operation | Limit |
|---|---|
| Model pulls | 30 pulls / hour |
Rate limit headers
Rate limit headers are returned on responses so you can track usage and implement safe retries.
| Header | Meaning |
|---|---|
x-ratelimit-limit | The maximum allowed requests in the current window |
x-ratelimit-remaining | The remaining requests in the current window |
x-ratelimit-reset | When the current window resets |
x-ratelimit-reset format
The x-ratelimit-reset value differs by service:
| Service | Reset value |
|---|---|
| API | Unix timestamp (seconds since epoch) |
| Other services (e.g., Model Registry) | Seconds until reset |
Handling rate limit errors
If you exceed a limit, the service responds with HTTP 429 (Too Many Requests). When this happens:
- Use
x-ratelimit-resetto determine when to retry. - Prefer exponential backoff for repeated retries.
- Avoid sustained bursts even if short spikes are allowed.
Best practices
- Authenticate requests to get higher default API throughput.
- Spread traffic evenly where possible instead of sending spikes.
- Batch work when supported rather than sending many small requests.