Aqta

Rate Limits

Aqta implements rate limits to ensure fair usage and system stability. Rate limits vary by tier and are enforced per API key.


Rate Limits by Tier

TierRequests/MonthRate LimitBurst LimitModels
Free5005/min10GPT-3.5 only
Starter100,000100/min200All models
Pro1,000,0001,000/min2,000All models
Healthcare1,000,0001,000/min2,000All models

What is Burst Limit?

Burst limit allows you to temporarily exceed your rate limit for short periods. For example, the Free tier allows 10 requests in a burst, even though the sustained rate is 5/min.

Model Restrictions

Free Tier:

  • ✅ GPT-3.5-turbo
  • ✅ Claude 3 Haiku
  • ✅ Gemini Flash
  • ❌ GPT-4 (upgrade to Starter)
  • ❌ Claude 3.5 Sonnet (upgrade to Starter)
  • ❌ GPT-4 Turbo (upgrade to Starter)

Starter, Pro, Healthcare:

  • ✅ All models available

Why? Free tier is limited to cost-effective models to keep the service sustainable while allowing you to test the platform.


How Rate Limits Work

Per-Minute Limits

Rate limits are calculated using a sliding window:

Time:     0s    10s   20s   30s   40s   50s   60s
Requests: 5     2     3     0     0     0     0
Total:    5     7     10    10    10    10    10 (in last 60s)

If you hit the limit, you'll receive a 429 Too Many Requests error.

Monthly Limits

Monthly limits reset on the 1st of each month at 00:00 UTC.

Example:

  • Free tier: 10,000 requests/month
  • If you use 10,000 requests by January 15th, you'll be rate limited until February 1st

Rate Limit Headers

Every API response includes rate limit headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1677652348
HeaderDescription
X-RateLimit-LimitMaximum requests per minute
X-RateLimit-RemainingRemaining requests in current window
X-RateLimit-ResetUnix timestamp when limit resets

Handling Rate Limits

429 Error Response

When you exceed the rate limit:

{
  "error": {
    "message": "Rate limit exceeded. Try again in 30 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "retry_after": 30
  }
}

Retry Logic (Python)

import openai
import time

def make_request_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = openai.ChatCompletion.create(
                model="gpt-4",
                messages=messages
            )
            return response
        except openai.error.RateLimitError as e:
            if attempt < max_retries - 1:
                retry_after = int(e.headers.get('Retry-After', 60))
                print(f"Rate limited. Retrying in {retry_after}s...")
                time.sleep(retry_after)
            else:
                raise

response = make_request_with_retry([
    {"role": "user", "content": "Hello!"}
])

Exponential Backoff (JavaScript)

async function makeRequestWithBackoff(messages, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await openai.chat.completions.create({
        model: 'gpt-4',
        messages: messages,
      });
      return response;
    } catch (error) {
      if (error.status === 429 && attempt < maxRetries - 1) {
        const retryAfter = error.headers?.['retry-after'] || 60;
        const backoff = Math.pow(2, attempt) * 1000; // Exponential backoff
        const delay = Math.max(retryAfter * 1000, backoff);
        
        console.log(`Rate limited. Retrying in ${delay/1000}s...`);
        await new Promise(resolve => setTimeout(resolve, delay));
      } else {
        throw error;
      }
    }
  }
}

const response = await makeRequestWithBackoff([
  { role: 'user', content: 'Hello!' }
]);

Best Practices

1. Monitor Rate Limit Headers

Always check X-RateLimit-Remaining to avoid hitting limits:

response = openai.ChatCompletion.create(...)

remaining = int(response.headers.get('X-RateLimit-Remaining', 0))
if remaining < 10:
    print(f"Warning: Only {remaining} requests remaining!")

2. Implement Exponential Backoff

Use exponential backoff when retrying failed requests:

Attempt 1: Wait 1s
Attempt 2: Wait 2s
Attempt 3: Wait 4s
Attempt 4: Wait 8s

3. Batch Requests

Instead of making many small requests, batch them when possible:

# Bad: 100 separate requests
for item in items:
    response = openai.ChatCompletion.create(
        messages=[{"role": "user", "content": f"Process {item}"}]
    )

# Good: 1 request with batched content
content = "\n".join([f"Process {item}" for item in items])
response = openai.ChatCompletion.create(
    messages=[{"role": "user", "content": content}]
)

4. Use Caching

Cache responses to avoid redundant requests:

import functools

@functools.lru_cache(maxsize=100)
def get_completion(prompt):
    response = openai.ChatCompletion.create(
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

# Subsequent calls with same prompt use cache
result1 = get_completion("What is AI?")
result2 = get_completion("What is AI?")  # Uses cache

5. Upgrade Your Tier

If you consistently hit rate limits, consider upgrading:

  • Free → Starter: 10x more requests (€49/month)
  • Starter → Pro: 10x more requests (€149/month)
  • Pro → Healthcare: Same limits, but HIPAA compliance (€999/month)

Rate Limit Exceptions

Streaming Requests

Streaming requests count as a single request, regardless of how long the stream is open.

Failed Requests

Failed requests (4xx, 5xx errors) still count toward your rate limit.

Webhook Callbacks

Webhook callbacks from Aqta do not count toward your rate limit.


Monitoring Usage

Dashboard

View your usage in real-time at app.aqta.ai/analytics:

  • Requests per minute (current)
  • Requests this month (total)
  • Remaining requests
  • Rate limit status

API

Get usage programmatically:

curl https://api.aqta.ai/v1/usage \
  -H "Authorisation: Bearer sk-aqta-your-key-here"

Response:

{
  "current_period": {
    "start": "2026-02-01T00:00:00Z",
    "end": "2026-03-01T00:00:00Z",
    "requests": 5432,
    "limit": 10000,
    "remaining": 4568
  },
  "rate_limit": {
    "requests_per_minute": 10,
    "current_usage": 3,
    "remaining": 7
  }
}

Upgrading Your Tier

To increase your rate limits:

  1. Visit app.aqta.ai/pricing
  2. Select a higher tier
  3. Complete payment
  4. Your new limits apply immediately

No downtime: Rate limit changes are applied instantly.


FAQ

Q: What happens if I exceed my monthly limit?

A: You'll receive 429 errors until the next billing cycle (1st of the month). Upgrade your tier to increase limits immediately.

Q: Can I purchase additional requests?

A: Not currently. Upgrade to a higher tier for more requests.

Q: Do rate limits apply per API key or per account?

A: Per API key. Create multiple keys to distribute load.

Q: What if I need higher limits temporarily?

A: Contact support@aqta.ai. We can temporarily increase limits for special events.

Q: Are rate limits enforced during free trials?

A: Yes. Free tier limits apply during trials.


Next Steps


Questions? Contact us at hello@aqta.ai