Rate Limits

Aqta implements rate limits to ensure fair usage and system stability. Rate limits vary by tier and are enforced per API key.

Rate Limits by Tier

Tier	Requests/Month	Rate Limit	Burst Limit	Models
Free	500	5/min	10	GPT-3.5 only
Starter	100,000	100/min	200	All models
Pro	1,000,000	1,000/min	2,000	All models
Healthcare	1,000,000	1,000/min	2,000	All models

What is Burst Limit?

Burst limit allows you to temporarily exceed your rate limit for short periods. For example, the Free tier allows 10 requests in a burst, even though the sustained rate is 5/min.

Model Restrictions

Free Tier:

✅ GPT-3.5-turbo
✅ Claude 3 Haiku
✅ Gemini Flash
❌ GPT-4 (upgrade to Starter)
❌ Claude 3.5 Sonnet (upgrade to Starter)
❌ GPT-4 Turbo (upgrade to Starter)

Starter, Pro, Healthcare:

✅ All models available

Why? Free tier is limited to cost-effective models to keep the service sustainable while allowing you to test the platform.

How Rate Limits Work

Per-Minute Limits

Rate limits are calculated using a sliding window:

Time:     0s    10s   20s   30s   40s   50s   60s
Requests: 5     2     3     0     0     0     0
Total:    5     7     10    10    10    10    10 (in last 60s)

If you hit the limit, you'll receive a 429 Too Many Requests error.

Monthly Limits

Monthly limits reset on the 1st of each month at 00:00 UTC.

Example:

Free tier: 10,000 requests/month
If you use 10,000 requests by January 15th, you'll be rate limited until February 1st

Rate Limit Headers

Every API response includes rate limit headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1677652348

Header	Description
`X-RateLimit-Limit`	Maximum requests per minute
`X-RateLimit-Remaining`	Remaining requests in current window
`X-RateLimit-Reset`	Unix timestamp when limit resets

Handling Rate Limits

429 Error Response

When you exceed the rate limit:

{
  "error": {
    "message": "Rate limit exceeded. Try again in 30 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "retry_after": 30
  }
}

Retry Logic (Python)

import openai
import time

def make_request_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = openai.ChatCompletion.create(
                model="gpt-4",
                messages=messages
            )
            return response
        except openai.error.RateLimitError as e:
            if attempt < max_retries - 1:
                retry_after = int(e.headers.get('Retry-After', 60))
                print(f"Rate limited. Retrying in {retry_after}s...")
                time.sleep(retry_after)
            else:
                raise

response = make_request_with_retry([
    {"role": "user", "content": "Hello!"}
])

Exponential Backoff (JavaScript)

async function makeRequestWithBackoff(messages, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await openai.chat.completions.create({
        model: 'gpt-4',
        messages: messages,
      });
      return response;
    } catch (error) {
      if (error.status === 429 && attempt < maxRetries - 1) {
        const retryAfter = error.headers?.['retry-after'] || 60;
        const backoff = Math.pow(2, attempt) * 1000; // Exponential backoff
        const delay = Math.max(retryAfter * 1000, backoff);
        
        console.log(`Rate limited. Retrying in ${delay/1000}s...`);
        await new Promise(resolve => setTimeout(resolve, delay));
      } else {
        throw error;
      }
    }
  }
}

const response = await makeRequestWithBackoff([
  { role: 'user', content: 'Hello!' }
]);

Best Practices

1. Monitor Rate Limit Headers

Always check X-RateLimit-Remaining to avoid hitting limits:

response = openai.ChatCompletion.create(...)

remaining = int(response.headers.get('X-RateLimit-Remaining', 0))
if remaining < 10:
    print(f"Warning: Only {remaining} requests remaining!")

2. Implement Exponential Backoff

Use exponential backoff when retrying failed requests:

Attempt 1: Wait 1s
Attempt 2: Wait 2s
Attempt 3: Wait 4s
Attempt 4: Wait 8s

3. Batch Requests

Instead of making many small requests, batch them when possible:

# Bad: 100 separate requests
for item in items:
    response = openai.ChatCompletion.create(
        messages=[{"role": "user", "content": f"Process {item}"}]
    )

# Good: 1 request with batched content
content = "\n".join([f"Process {item}" for item in items])
response = openai.ChatCompletion.create(
    messages=[{"role": "user", "content": content}]
)

4. Use Caching

Cache responses to avoid redundant requests:

import functools

@functools.lru_cache(maxsize=100)
def get_completion(prompt):
    response = openai.ChatCompletion.create(
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

# Subsequent calls with same prompt use cache
result1 = get_completion("What is AI?")
result2 = get_completion("What is AI?")  # Uses cache

5. Upgrade Your Tier

If you consistently hit rate limits, consider upgrading:

Free → Starter: 10x more requests (€49/month)
Starter → Pro: 10x more requests (€149/month)
Pro → Healthcare: Same limits, but HIPAA compliance (€999/month)

Rate Limit Exceptions

Streaming Requests

Streaming requests count as a single request, regardless of how long the stream is open.

Failed Requests

Failed requests (4xx, 5xx errors) still count toward your rate limit.

Webhook Callbacks

Webhook callbacks from Aqta do not count toward your rate limit.

Monitoring Usage

Dashboard

View your usage in real-time at app.aqta.ai/analytics:

Requests per minute (current)
Requests this month (total)
Remaining requests
Rate limit status

API

Get usage programmatically:

curl https://api.aqta.ai/v1/usage \
  -H "Authorisation: Bearer sk-aqta-your-key-here"

Response:

{
  "current_period": {
    "start": "2026-02-01T00:00:00Z",
    "end": "2026-03-01T00:00:00Z",
    "requests": 5432,
    "limit": 10000,
    "remaining": 4568
  },
  "rate_limit": {
    "requests_per_minute": 10,
    "current_usage": 3,
    "remaining": 7
  }
}

Upgrading Your Tier

To increase your rate limits:

Visit app.aqta.ai/pricing
Select a higher tier
Complete payment
Your new limits apply immediately

No downtime: Rate limit changes are applied instantly.

FAQ

Q: What happens if I exceed my monthly limit?

A: You'll receive 429 errors until the next billing cycle (1st of the month). Upgrade your tier to increase limits immediately.

Q: Can I purchase additional requests?

A: Not currently. Upgrade to a higher tier for more requests.

Q: Do rate limits apply per API key or per account?

A: Per API key. Create multiple keys to distribute load.

Q: What if I need higher limits temporarily?

A: Contact support@aqta.ai. We can temporarily increase limits for special events.

Q: Are rate limits enforced during free trials?

A: Yes. Free tier limits apply during trials.

Next Steps

Authentication - Get your API key
API Endpoints - Available endpoints
Pricing - Upgrade your tier

Questions? Contact us at hello@aqta.ai