WriftAI Logo

How to handle WriftAI rate limits

Jul 1, 2026

·

2 minute read

author image

Muhammad Mustafa

Tutorials

Share

Rate Limits Hero Image

When your requests start returning 429, the naive fix is to sleep and retry. If you picked a fixed sleep interval, you either waited longer than necessary or woke up at the same moment as every other worker and hit the limit again.

WriftAI rate limits both the API and the Model Registry. They have always been enforced. Until now, the only way to discover them was to hit a 429 and read the response. They are now documented at wrift.ai/docs/more-information/rate-limits.

What the limits are

The API is tiered by endpoint and authentication. Model Registry pulls have a separate limit. The numbers may change, so check the docs linked above for current values.

The response headers

Every response includes three rate limit headers:


1x-ratelimit-limit: 3
2x-ratelimit-remaining: 0
3x-ratelimit-reset: 1751414820

x-ratelimit-limit is the maximum for the current window. x-ratelimit-remaining is how many you have left. x-ratelimit-reset is when the window resets.

The format of x-ratelimit-reset is not the same for every service. For API endpoints, it is a Unix timestamp. For the Model Registry, it is seconds until reset. The wait calculation is different in each case.

Using the official clients

If you are using one of the official clients, this is already handled. The clients retry automatically on 429 and other transient errors. On rate limit responses, they wait until the x-ratelimit-reset window expires. On other transient errors, they use exponential backoff. The default is 2 retries.

The retry count is configurable at client initialization. Set it higher if your workload needs more attempts, or 0 to disable retries entirely.

Without an official client

If you are not using an official client, catch 429 responses, read x-ratelimit-reset, and sleep until the window expires before retrying. Do not guess at a fixed sleep duration. If you are running multiple workers, add jitter to the wait. Without it, every worker wakes at the same reset moment, retries at once, and hits the limit again. This is the thundering herd problem.

API endpoints


1import time
2import random
3import requests
4
5for _ in range(5):
6 response = requests.post(
7 "https://api.wrift.ai/v1/hardware",
8 json=payload,
9 headers={"Authorization": f"Bearer {WRIFTAI_ACCESS_TOKEN}"},
10 )
11
12 if response.status_code != 429:
13 response.raise_for_status()
14 break
15
16 wait = max(0, float(response.headers["x-ratelimit-reset"]) - time.time())
17 time.sleep(wait + random.uniform(0, 1))
18else:
19 response.raise_for_status()

Model Registry endpoints

For registry responses, x-ratelimit-reset is seconds until reset, not a Unix timestamp:


1wait = float(response.headers["x-ratelimit-reset"])
2time.sleep(wait + random.uniform(0, 1))

Custom limits

The defaults work for most workloads. If you need more, contact support with your expected requests per second, the endpoints you are hitting, and your traffic pattern. We handle high-throughput accounts separately.


Scale Your Projects.
Build With Confidence.

Scale your projects effortlessly with WriftAI. Seamlessly integrate and optimize performance as you expand and innovate.

© 2026 Sych Inc.