Rate Limits
Aqta implements rate limits to ensure fair usage and system stability. Rate limits vary by tier and are enforced per API key.
Rate Limits by Tier
| Tier | Requests/Month | Rate Limit | Burst Limit | Models |
|---|---|---|---|---|
| Free | 500 | 5/min | 10 | GPT-3.5 only |
| Starter | 100,000 | 100/min | 200 | All models |
| Pro | 1,000,000 | 1,000/min | 2,000 | All models |
| Healthcare | 1,000,000 | 1,000/min | 2,000 | All models |
What is Burst Limit?
Burst limit allows you to temporarily exceed your rate limit for short periods. For example, the Free tier allows 10 requests in a burst, even though the sustained rate is 5/min.
Model Restrictions
Free Tier:
- ✅ GPT-3.5-turbo
- ✅ Claude 3 Haiku
- ✅ Gemini Flash
- ❌ GPT-4 (upgrade to Starter)
- ❌ Claude 3.5 Sonnet (upgrade to Starter)
- ❌ GPT-4 Turbo (upgrade to Starter)
Starter, Pro, Healthcare:
- ✅ All models available
Why? Free tier is limited to cost-effective models to keep the service sustainable while allowing you to test the platform.
How Rate Limits Work
Per-Minute Limits
Rate limits are calculated using a sliding window:
Time: 0s 10s 20s 30s 40s 50s 60s
Requests: 5 2 3 0 0 0 0
Total: 5 7 10 10 10 10 10 (in last 60s)
If you hit the limit, you'll receive a 429 Too Many Requests error.
Monthly Limits
Monthly limits reset on the 1st of each month at 00:00 UTC.
Example:
- Free tier: 10,000 requests/month
- If you use 10,000 requests by January 15th, you'll be rate limited until February 1st
Rate Limit Headers
Every API response includes rate limit headers:
HTTP/1.1 200 OK X-RateLimit-Limit: 100 X-RateLimit-Remaining: 95 X-RateLimit-Reset: 1677652348
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests per minute |
X-RateLimit-Remaining | Remaining requests in current window |
X-RateLimit-Reset | Unix timestamp when limit resets |
Handling Rate Limits
429 Error Response
When you exceed the rate limit:
{ "error": { "message": "Rate limit exceeded. Try again in 30 seconds.", "type": "rate_limit_error", "code": "rate_limit_exceeded", "retry_after": 30 } }
Retry Logic (Python)
import openai import time def make_request_with_retry(messages, max_retries=3): for attempt in range(max_retries): try: response = openai.ChatCompletion.create( model="gpt-4", messages=messages ) return response except openai.error.RateLimitError as e: if attempt < max_retries - 1: retry_after = int(e.headers.get('Retry-After', 60)) print(f"Rate limited. Retrying in {retry_after}s...") time.sleep(retry_after) else: raise response = make_request_with_retry([ {"role": "user", "content": "Hello!"} ])
Exponential Backoff (JavaScript)
async function makeRequestWithBackoff(messages, maxRetries = 3) { for (let attempt = 0; attempt < maxRetries; attempt++) { try { const response = await openai.chat.completions.create({ model: 'gpt-4', messages: messages, }); return response; } catch (error) { if (error.status === 429 && attempt < maxRetries - 1) { const retryAfter = error.headers?.['retry-after'] || 60; const backoff = Math.pow(2, attempt) * 1000; // Exponential backoff const delay = Math.max(retryAfter * 1000, backoff); console.log(`Rate limited. Retrying in ${delay/1000}s...`); await new Promise(resolve => setTimeout(resolve, delay)); } else { throw error; } } } } const response = await makeRequestWithBackoff([ { role: 'user', content: 'Hello!' } ]);
Best Practices
1. Monitor Rate Limit Headers
Always check X-RateLimit-Remaining to avoid hitting limits:
response = openai.ChatCompletion.create(...) remaining = int(response.headers.get('X-RateLimit-Remaining', 0)) if remaining < 10: print(f"Warning: Only {remaining} requests remaining!")
2. Implement Exponential Backoff
Use exponential backoff when retrying failed requests:
Attempt 1: Wait 1s
Attempt 2: Wait 2s
Attempt 3: Wait 4s
Attempt 4: Wait 8s
3. Batch Requests
Instead of making many small requests, batch them when possible:
# Bad: 100 separate requests for item in items: response = openai.ChatCompletion.create( messages=[{"role": "user", "content": f"Process {item}"}] ) # Good: 1 request with batched content content = "\n".join([f"Process {item}" for item in items]) response = openai.ChatCompletion.create( messages=[{"role": "user", "content": content}] )
4. Use Caching
Cache responses to avoid redundant requests:
import functools @functools.lru_cache(maxsize=100) def get_completion(prompt): response = openai.ChatCompletion.create( messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content # Subsequent calls with same prompt use cache result1 = get_completion("What is AI?") result2 = get_completion("What is AI?") # Uses cache
5. Upgrade Your Tier
If you consistently hit rate limits, consider upgrading:
- Free → Starter: 10x more requests (€49/month)
- Starter → Pro: 10x more requests (€149/month)
- Pro → Healthcare: Same limits, but HIPAA compliance (€999/month)
Rate Limit Exceptions
Streaming Requests
Streaming requests count as a single request, regardless of how long the stream is open.
Failed Requests
Failed requests (4xx, 5xx errors) still count toward your rate limit.
Webhook Callbacks
Webhook callbacks from Aqta do not count toward your rate limit.
Monitoring Usage
Dashboard
View your usage in real-time at app.aqta.ai/analytics:
- Requests per minute (current)
- Requests this month (total)
- Remaining requests
- Rate limit status
API
Get usage programmatically:
curl https://api.aqta.ai/v1/usage \ -H "Authorisation: Bearer sk-aqta-your-key-here"
Response:
{ "current_period": { "start": "2026-02-01T00:00:00Z", "end": "2026-03-01T00:00:00Z", "requests": 5432, "limit": 10000, "remaining": 4568 }, "rate_limit": { "requests_per_minute": 10, "current_usage": 3, "remaining": 7 } }
Upgrading Your Tier
To increase your rate limits:
- Visit app.aqta.ai/pricing
- Select a higher tier
- Complete payment
- Your new limits apply immediately
No downtime: Rate limit changes are applied instantly.
FAQ
Q: What happens if I exceed my monthly limit?
A: You'll receive 429 errors until the next billing cycle (1st of the month). Upgrade your tier to increase limits immediately.
Q: Can I purchase additional requests?
A: Not currently. Upgrade to a higher tier for more requests.
Q: Do rate limits apply per API key or per account?
A: Per API key. Create multiple keys to distribute load.
Q: What if I need higher limits temporarily?
A: Contact support@aqta.ai. We can temporarily increase limits for special events.
Q: Are rate limits enforced during free trials?
A: Yes. Free tier limits apply during trials.
Next Steps
- Authentication - Get your API key
- API Endpoints - Available endpoints
- Pricing - Upgrade your tier
Questions? Contact us at hello@aqta.ai