Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritize retry state over created for same singleton key in stately fetch #536

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

foteiniskiada
Copy link

Issue: #535

@klesgidis
Copy link
Contributor

@timgit could you please take a look? We are experiencing issues is our production application

@timgit
Copy link
Owner

timgit commented Jan 26, 2025

Thanks for pointing out this limitation with stately queues. You're correct in your assessment regarding how to extend a stately queue with a singletonKey. After reviewing the PR, I think it adds too much complexity to enforce across all queue types, and would negatively impact performance in a larger queue. I also don't think it will correctly resolve all failure use cases.

The failure case you're seeing is because once the next job would produce a unique constraint violation (with or without a singletonKey), all job processing will be blocked.

One example of how this fails is:

  1. Create job A with singletonKey=123. It's in created state.
  2. Create job B with singletonKey=123. Job B is rejected correctly because of the unique constraint. This is the happy path use case.
  3. Fetch, putting job A in active state.
  4. Create job C with singletonKey=123. It's in created state.
  5. Create job D with singletonKey=456. It's in created state.
  6. Fail job A, putting it in retry state.
  7. Now, attempt to fetch 2 jobs. Since the default sort is by creation date, the next 2 jobs have the same singletonKey. This attempts to set both jobs to active state, but only 1 is allowed.

The only way I see to avoid this use case is to not use batching with stately queues. The batch processing SQL statement needs to be enhanced to allow dropping one of the previously accepted jobs. This feels like a gray area since the job was previously accepted, but stately queues are already the type of policy that is accustomed to dropping jobs. In its current state, batching with a unique constraint violation will block all processing until the batch size is reduced back to 1.

Another side effect of this behavior is more closely related to this PR, which adds a sort condition for the singletonKey. However, this would still produce a processing limitation once a conflict is experienced on a particular key. Once a unique constraint is triggered for any job, no other jobs can be processed. This more closely aligns with the original intent of these queue policies, which is to reduce concurrency as much as possible.

@klesgidis
Copy link
Contributor

Thank you for your detailed response and for the great work you are doing with pgboss. We truly appreciate your efforts in maintaining and improving this library.

The only way I see to avoid this use case is to not use batching with stately queues. The batch processing SQL statement needs to be enhanced to allow dropping one of the previously accepted jobs. This feels like a gray area since the job was previously accepted, but stately queues are already the type of policy that is accustomed to dropping jobs. In its current state, batching with a unique constraint violation will block all processing until the batch size is reduced back to 1.

We understand your perspective; however, using batchSize=1 is not a viable solution for us due to the scale of our production application. To provide some context, our system processes hundreds of jobs per second. Reducing the batch size to 1 would result in significant delays and impact overall performance. Therefore, we need to find a solution that allows us to continue utilizing batch processing while addressing the unique constraint issue introduced in version 10. This is why we proposed the PR.

After reviewing the PR, I think it adds too much complexity to enforce across all queue types, and would negatively impact performance in a larger queue. I also don't think it will correctly resolve all failure use cases.

Could you provide more details or results from performance tests that highlight this impact? In our view, a queue processing one job per query (batch size 1) represents a far greater performance concern for the entire system. We would be interested in understanding how the proposed changes specifically add complexity or impact performance in large queues.

Another side effect of this behavior is more closely related to this PR, which adds a sort condition for the singletonKey. However, this would still produce a processing limitation once a conflict is experienced on a particular key. Once a unique constraint is triggered for any job, no other jobs can be processed. This more closely aligns with the original intent of these queue policies, which is to reduce concurrency as much as possible.

Could you clarify how a conflict would occur, given that the implementation uses DISTINCT on the singletonKey? From our understanding, this should prevent such conflicts from arising.

We have been attempting to upgrade to v10 for nearly two months now but are facing significant performance issues without batch processing. We sincerely hope to find a resolution through collaboration, as we value the capabilities of pgboss. However, if we cannot maintain the necessary system performance, we may need to explore alternative solutions.

Once again, thank you for your hard work and for taking the time to consider our input. We look forward to hearing your thoughts on this matter.

@klesgidis
Copy link
Contributor

@timgit any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants