Skip to content

Commit

Permalink
Get rid of noisy task adding failure log in matching service (cadence…
Browse files Browse the repository at this point in the history
…-workflow#5445)

What changed?
Failed to add task error log for remote sync match failed scenario seems to be too noisy (96% of matching logs). We log it for multiple cases such as:
- task forwarded from child partition but domain is not active in current cluster (ref)
- task forwarded from child partition but couldn't be sync matched (ref)

There's already a metric cadence_errors_remote_syncmatch_failed_per_tl counting this specific case and it has the same breakdown as the log does:
- domain
- tasklist name
- operation (adddecision vs addactivity task)

So the change is to remove this log line. Other sub categories of Failed to add task error log will still be logged by the matching context's handleErr as an Uncategorized error. This seems fine to me but open for feedback here. We can keep those other sub types around and just get rid of remote sync match failed because it's a known type w/ its own counter.

Misc Change:
Only signal taskReader if there's no error and no sync match found. Previously it was being signalled even for sync matches but that looked unnecessary as nothing is written to persistence for sync match. This can help reduce the unnecessary DB calls during sync matches.
  • Loading branch information
taylanisikdemir authored Nov 14, 2023
1 parent ac884c3 commit 10ae6c4
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 21 deletions.
26 changes: 12 additions & 14 deletions service/matching/context.go
Original file line number Diff line number Diff line change
Expand Up @@ -87,45 +87,43 @@ func (reqCtx *handlerContext) handleErr(err error) error {
return nil
}

scope := reqCtx.scope

switch err.(type) {
case *types.InternalServiceError:
scope.IncCounter(metrics.CadenceFailuresPerTaskList)
reqCtx.scope.IncCounter(metrics.CadenceFailuresPerTaskList)
reqCtx.logger.Error("Internal service error", tag.Error(err))
return err
case *types.BadRequestError:
scope.IncCounter(metrics.CadenceErrBadRequestPerTaskListCounter)
reqCtx.scope.IncCounter(metrics.CadenceErrBadRequestPerTaskListCounter)
return err
case *types.EntityNotExistsError:
scope.IncCounter(metrics.CadenceErrEntityNotExistsPerTaskListCounter)
reqCtx.scope.IncCounter(metrics.CadenceErrEntityNotExistsPerTaskListCounter)
return err
case *types.WorkflowExecutionAlreadyStartedError:
scope.IncCounter(metrics.CadenceErrExecutionAlreadyStartedPerTaskListCounter)
reqCtx.scope.IncCounter(metrics.CadenceErrExecutionAlreadyStartedPerTaskListCounter)
return err
case *types.DomainAlreadyExistsError:
scope.IncCounter(metrics.CadenceErrDomainAlreadyExistsPerTaskListCounter)
reqCtx.scope.IncCounter(metrics.CadenceErrDomainAlreadyExistsPerTaskListCounter)
return err
case *types.QueryFailedError:
scope.IncCounter(metrics.CadenceErrQueryFailedPerTaskListCounter)
reqCtx.scope.IncCounter(metrics.CadenceErrQueryFailedPerTaskListCounter)
return err
case *types.LimitExceededError:
scope.IncCounter(metrics.CadenceErrLimitExceededPerTaskListCounter)
reqCtx.scope.IncCounter(metrics.CadenceErrLimitExceededPerTaskListCounter)
return err
case *types.ServiceBusyError:
scope.IncCounter(metrics.CadenceErrServiceBusyPerTaskListCounter)
reqCtx.scope.IncCounter(metrics.CadenceErrServiceBusyPerTaskListCounter)
return err
case *types.DomainNotActiveError:
scope.IncCounter(metrics.CadenceErrDomainNotActivePerTaskListCounter)
reqCtx.scope.IncCounter(metrics.CadenceErrDomainNotActivePerTaskListCounter)
return err
case *types.RemoteSyncMatchedError:
scope.IncCounter(metrics.CadenceErrRemoteSyncMatchFailedPerTaskListCounter)
reqCtx.scope.IncCounter(metrics.CadenceErrRemoteSyncMatchFailedPerTaskListCounter)
return err
case *types.StickyWorkerUnavailableError:
scope.IncCounter(metrics.CadenceErrStickyWorkerUnavailablePerTaskListCounter)
reqCtx.scope.IncCounter(metrics.CadenceErrStickyWorkerUnavailablePerTaskListCounter)
return err
default:
scope.IncCounter(metrics.CadenceFailuresPerTaskList)
reqCtx.scope.IncCounter(metrics.CadenceFailuresPerTaskList)
reqCtx.logger.Error("Uncategorized error", tag.Error(err))
return &types.InternalServiceError{Message: err.Error()}
}
Expand Down
8 changes: 1 addition & 7 deletions service/matching/taskListManager.go
Original file line number Diff line number Diff line change
Expand Up @@ -309,13 +309,7 @@ func (c *taskListManagerImpl) AddTask(ctx context.Context, params addTaskParams)
return c.taskWriter.appendTask(params.execution, params.taskInfo)
})

if err != nil {
c.logger.Error("Failed to add task",
tag.Error(err),
tag.WorkflowTaskListName(c.taskListID.name),
tag.WorkflowTaskListType(c.taskListID.taskType),
)
} else {
if err == nil && !syncMatch {
c.taskReader.Signal()
}

Expand Down

0 comments on commit 10ae6c4

Please sign in to comment.