Hi, I've configured a gpt-4o model with a 50k tokens-per-minute limit, which Azure translates into 300 requests per minute, i.e. 50 requests for every 10-second window. Despite this, I hit the limit with only ten or so requests when sending them in a…
With proper use of the rate limit headers, 429 should be able to be avoided when using a single thread to request, but 429 could still happen when using multi-threading, and it would help to have the x-ratelimit-* headers. for the 200 ca...
This would work and people would be able to request cached repositories without wasting any rate limit - but in the end you might end up with load issues, and in worst case implement your own rate limit. About asking GitHub - but as you said this is only per-interval, and for nipster...
ThrottleRequestsException {#45 ▼-statusCode:429-headers:array:4[▼"X-RateLimit-Limit"=>1"X-RateLimit-Remaining"=>0"Retry-After"=>57"X-RateLimit-Reset"=>1604046100]#message: "Too Many Attempts."#code: 0#file: "/var/www/html/app/Http/Middleware/ThrottleR...
For every network request, openai provides information about x-ratelimit-remaining-requests,x-ratelimit-remaining-tokens,x-request-time in the response headers. It would be nice to know how to get the values of these limits using axios/curl query.