Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Subtask] Implement concurrency management with asyncio #8

Closed
3 tasks done
debuggerone opened this issue Sep 5, 2024 · 2 comments
Closed
3 tasks done

[Subtask] Implement concurrency management with asyncio #8

debuggerone opened this issue Sep 5, 2024 · 2 comments

Comments

@debuggerone
Copy link
Collaborator

debuggerone commented Sep 5, 2024

Subtask Overview

This subtask involves implementing concurrency management using Python’s asyncio to handle parallel tasks effectively during the AgentM migration.

Tasks

  • Review JavaScript concurrency logic and translate it to Python’s asyncio framework.
  • Implement Python functions for managing parallel tasks using asyncio.
  • Test concurrency functions to ensure they are efficient and handle parallel operations correctly.

Acceptance Criteria

  • Concurrency management is implemented using Python’s asyncio.
  • Functions are translated from JavaScript to Python with proper async/await syntax.
  • Concurrency functions are tested and perform as expected under various scenarios.

Additional Info

Some kind of data storage is needed to keep track of rate limits

Before an agent starts to run we use tiktoken to identify the size of the prompt and we add the desired maxtoken and then
we look up the token and requests used in the current minute

if there is at least 1 request left we add the calculated token of the current request to the used token in the current minute - if that is below the rate limit we can start the process

@Stevenic
Copy link
Owner

Stevenic commented Sep 9, 2024

Some kind of data storage is needed to keep track of rate limits

Before an agent starts to run we use tiktoken to identify the size of the prompt and we add the desired maxtoken and then
we look up the token and requests used in the current minute

if there is at least 1 request left we add the calculated token of the current request to the used token in the current minute - if that is below the rate limit we can start the process

I mentioned this in the discussion thread. I don't think you want to overthink this. For rate limiting it's honestly better to let the error happen and then backoff for the period mentioned in the response header and I believe OpenAI's library does that automatically. I'm happy to have discussion around the various cons with trying to track this stuff in a database.

@debuggerone
Copy link
Collaborator Author

nah, you are right.. I've removed the database

debuggerone added a commit that referenced this issue Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants