Prioritize retry state over created for same singleton key in stately fetch #536

foteiniskiada · 2025-01-15T15:42:58Z

Issue: #535

… fetch

klesgidis · 2025-01-22T08:14:42Z

@timgit could you please take a look? We are experiencing issues is our production application

timgit · 2025-01-26T15:57:07Z

Thanks for pointing out this limitation with stately queues. You're correct in your assessment regarding how to extend a stately queue with a singletonKey. After reviewing the PR, I think it adds too much complexity to enforce across all queue types, and would negatively impact performance in a larger queue. I also don't think it will correctly resolve all failure use cases.

The failure case you're seeing is because once the next job would produce a unique constraint violation (with or without a singletonKey), all job processing will be blocked.

One example of how this fails is:

Create job A with singletonKey=123. It's in created state.
Create job B with singletonKey=123. Job B is rejected correctly because of the unique constraint. This is the happy path use case.
Fetch, putting job A in active state.
Create job C with singletonKey=123. It's in created state.
Create job D with singletonKey=456. It's in created state.
Fail job A, putting it in retry state.
Now, attempt to fetch 2 jobs. Since the default sort is by creation date, the next 2 jobs have the same singletonKey. This attempts to set both jobs to active state, but only 1 is allowed.

The only way I see to avoid this use case is to not use batching with stately queues. The batch processing SQL statement needs to be enhanced to allow dropping one of the previously accepted jobs. This feels like a gray area since the job was previously accepted, but stately queues are already the type of policy that is accustomed to dropping jobs. In its current state, batching with a unique constraint violation will block all processing until the batch size is reduced back to 1.

Another side effect of this behavior is more closely related to this PR, which adds a sort condition for the singletonKey. However, this would still produce a processing limitation once a conflict is experienced on a particular key. Once a unique constraint is triggered for any job, no other jobs can be processed. This more closely aligns with the original intent of these queue policies, which is to reduce concurrency as much as possible.

klesgidis · 2025-01-28T15:50:28Z

Thank you for your detailed response and for the great work you are doing with pgboss. We truly appreciate your efforts in maintaining and improving this library.

The only way I see to avoid this use case is to not use batching with stately queues. The batch processing SQL statement needs to be enhanced to allow dropping one of the previously accepted jobs. This feels like a gray area since the job was previously accepted, but stately queues are already the type of policy that is accustomed to dropping jobs. In its current state, batching with a unique constraint violation will block all processing until the batch size is reduced back to 1.

We understand your perspective; however, using batchSize=1 is not a viable solution for us due to the scale of our production application. To provide some context, our system processes hundreds of jobs per second. Reducing the batch size to 1 would result in significant delays and impact overall performance. Therefore, we need to find a solution that allows us to continue utilizing batch processing while addressing the unique constraint issue introduced in version 10. This is why we proposed the PR.

After reviewing the PR, I think it adds too much complexity to enforce across all queue types, and would negatively impact performance in a larger queue. I also don't think it will correctly resolve all failure use cases.

Could you provide more details or results from performance tests that highlight this impact? In our view, a queue processing one job per query (batch size 1) represents a far greater performance concern for the entire system. We would be interested in understanding how the proposed changes specifically add complexity or impact performance in large queues.

Another side effect of this behavior is more closely related to this PR, which adds a sort condition for the singletonKey. However, this would still produce a processing limitation once a conflict is experienced on a particular key. Once a unique constraint is triggered for any job, no other jobs can be processed. This more closely aligns with the original intent of these queue policies, which is to reduce concurrency as much as possible.

Could you clarify how a conflict would occur, given that the implementation uses DISTINCT on the singletonKey? From our understanding, this should prevent such conflicts from arising.

We have been attempting to upgrade to v10 for nearly two months now but are facing significant performance issues without batch processing. We sincerely hope to find a resolution through collaboration, as we value the capabilities of pgboss. However, if we cannot maintain the necessary system performance, we may need to explore alternative solutions.

Once again, thank you for your hard work and for taking the time to consider our input. We look forward to hearing your thoughts on this matter.

klesgidis · 2025-02-19T13:37:17Z

@timgit any update on this?

Prioritize retry state over created for same singleton key in stately…

c5727d5

… fetch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prioritize retry state over created for same singleton key in stately fetch #536

Prioritize retry state over created for same singleton key in stately fetch #536

foteiniskiada commented Jan 15, 2025

klesgidis commented Jan 22, 2025

timgit commented Jan 26, 2025

klesgidis commented Jan 28, 2025

klesgidis commented Feb 19, 2025

Prioritize retry state over created for same singleton key in stately fetch #536

Are you sure you want to change the base?

Prioritize retry state over created for same singleton key in stately fetch #536

Conversation

foteiniskiada commented Jan 15, 2025

klesgidis commented Jan 22, 2025

timgit commented Jan 26, 2025

klesgidis commented Jan 28, 2025

klesgidis commented Feb 19, 2025