Error Handling

Handle message processing failures in Nimbus using retries, dead letter queues, and error interceptors

Overview

Distributed messaging introduces failure modes that don’t exist in direct function calls: handlers can fail, the transport can be temporarily unavailable, and messages can be malformed. Nimbus and the underlying transports provide several mechanisms to handle these gracefully.

What Happens When a Handler Throws

When a handler throws an exception, Nimbus does not acknowledge the message. The transport retries delivery according to its configured policy:

Message arrives
    │
    ▼
Handler throws exception
    │
    ▼
Message is NOT acknowledged (nacked)
    │
    ▼
Transport re-queues the message
    │
    ▼
Retry after delay (transport-dependent)
    │
    ▼
[Repeat until max delivery attempts reached]
    │
    ▼
Message moved to dead letter queue

Your handler code doesn’t need to do anything special — just let exceptions propagate:

public class PlaceOrderHandler : IHandleCommand<PlaceOrderCommand>
{
    public async Task Handle(PlaceOrderCommand command)
    {
        // If this throws, Nimbus will retry the message
        var order = await _orderRepository.Save(command);

        // If this throws, Nimbus will retry the message
        // The repository operation may or may not have committed — design for idempotency
        await _bus.Publish(new OrderPlacedEvent { OrderId = order.Id });
    }
}

Retry Policies

Each transport has its own default retry behaviour:

Transport	Default Retries	Retry Strategy
Azure Service Bus	10 (configurable)	Exponential backoff
Redis	Immediate re-queue	No backoff
AMQP (ActiveMQ Artemis)	Broker-configured	Depends on broker
In-Process	None	Dropped or re-queued

Azure Service Bus

Configure max delivery attempts on the transport:

var transport = new AzureServiceBusTransportConfiguration()
    .WithConnectionString(connectionString)
    .WithMaxDeliveryAttempts(5);  // Retry up to 5 times before dead-lettering

Redis

Redis re-queues immediately on failure. Configure your handler to implement its own backoff for transient errors:

public async Task Handle(PlaceOrderCommand command)
{
    var attempt = 0;
    while (true)
    {
        try
        {
            await _externalService.Process(command);
            return;
        }
        catch (TransientException) when (attempt < 3)
        {
            attempt++;
            await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)));
        }
    }
}

Dead Letter Queues

After exhausting retries, messages are moved to a dead letter queue (DLQ). The DLQ acts as a quarantine for messages that couldn’t be processed — they can be inspected, fixed, and resubmitted.

Not all transports support dead letter queues natively. Azure Service Bus has built-in DLQ support. Redis and In-Process require manual dead letter handling.

Azure Service Bus DLQ

Azure Service Bus automatically creates a dead letter sub-queue for each queue and topic subscription. Messages moved there include the reason for dead-lettering:

DLQ Reason	Description
`MaxDeliveryCountExceeded`	Exceeded max retry attempts
`MessageTimeToLiveExpired`	Message TTL expired before processing
`HeaderSizeExceeded`	Message headers too large

Monitoring Dead Letter Queues

Monitor DLQ depth as a key operational metric. High DLQ depth indicates a systemic problem:

// Azure Service Bus - check DLQ depth using the SDK directly
var receiver = serviceBusClient.CreateReceiver(
    "OrderService.PlaceOrderCommand/$deadletterqueue");

// Read and inspect dead-lettered messages
await foreach (var message in receiver.ReceiveMessagesAsync())
{
    Console.WriteLine($"Dead letter reason: {message.DeadLetterReason}");
    Console.WriteLine($"Error description: {message.DeadLetterErrorDescription}");
    Console.WriteLine($"Original message: {message.Body}");
}

Reprocessing Dead-Lettered Messages

After fixing the root cause, resubmit messages from the DLQ:

// Move from DLQ back to the main queue for reprocessing
var dlqReceiver = serviceBusClient.CreateReceiver(
    "OrderService.PlaceOrderCommand/$deadletterqueue");
var sender = serviceBusClient.CreateSender("OrderService.PlaceOrderCommand");

var messages = await dlqReceiver.ReceiveMessagesAsync(maxMessages: 100);
foreach (var message in messages)
{
    var resent = new ServiceBusMessage(message.Body);
    await sender.SendMessageAsync(resent);
    await dlqReceiver.CompleteMessageAsync(message);
}

Idempotent Handlers

Because messages can be delivered more than once (retries, at-least-once delivery semantics), handlers must be idempotent — processing the same message twice must produce the same outcome as processing it once.

Natural Idempotency

Some operations are naturally idempotent:

public async Task Handle(UpdateOrderStatusCommand command)
{
    // Setting a status is idempotent — doing it twice has no extra effect
    await _repository.SetStatus(command.OrderId, command.Status);
}

Idempotency Keys

For operations that aren’t naturally idempotent, track processed message IDs:

public class SendEmailHandler : IHandleCommand<SendEmailCommand>
{
    private readonly IEmailService _emailService;
    private readonly IProcessedMessageLog _log;

    public async Task Handle(SendEmailCommand command)
    {
        // Check if we've already processed this exact message
        if (await _log.AlreadyProcessed(command.MessageId))
        {
            return; // Duplicate — skip processing
        }

        await _emailService.Send(command.To, command.Subject, command.Body);

        // Record that we've processed this message
        await _log.Record(command.MessageId);
    }
}

The idempotency check and the business action should ideally be in the same transaction. If they’re not, a failure between them can leave the system in an inconsistent state.

Using Interceptors for Error Handling

Use inbound interceptors to centralise error handling logic across all handlers:

public class ErrorLoggingInterceptor : IInboundInterceptor
{
    private readonly ILogger _logger;
    private readonly IAlertService _alerts;

    public int Priority => 0;

    public async Task OnCommandHandlerError<TCommand>(
        TCommand command, NimbusMessage message, Exception exception)
        where TCommand : IBusCommand
    {
        _logger.Error(exception,
            "Handler failed for {CommandType} (attempt {Attempt}/{Max}) [{MessageId}]",
            typeof(TCommand).Name,
            message.DeliveryCount,
            _maxDeliveryAttempts,
            message.MessageId);

        // Alert on final delivery attempt
        if (message.DeliveryCount >= _maxDeliveryAttempts)
        {
            await _alerts.Send(
                $"Message dead-lettered: {typeof(TCommand).Name}",
                exception.ToString());
        }
    }

    // Implement no-ops for other methods...
}

Transient vs. Permanent Failures

Not all exceptions are equal. Design handlers to distinguish between failures that should retry and ones that shouldn’t:

public async Task Handle(PlaceOrderCommand command)
{
    try
    {
        await _orderRepository.Save(command);
    }
    catch (SqlException ex) when (ex.IsTransient)
    {
        // Transient DB error — let it throw, transport will retry
        throw;
    }
    catch (ValidationException ex)
    {
        // Permanent failure — retrying won't help
        // Log and swallow to avoid DLQ noise for bad data
        _logger.Warning(ex, "Invalid command data for order {OrderId}", command.OrderId);
        return; // Acknowledge the message — don't retry
    }
}

Swallowing exceptions (returning without throwing) tells Nimbus to acknowledge the message, removing it from the queue without retry. Do this only for permanent failures where retrying would never succeed.

Message Expiration

Messages can have a time-to-live (TTL). If not processed within the TTL, they expire and are dead-lettered:

// Set TTL when sending — relevant for time-sensitive commands
await bus.Send(
    new SendTimeSensitiveAlertCommand { AlertId = "ALERT-001" },
    expiresAfter: DateTimeOffset.UtcNow.AddMinutes(15));

Design handlers to check whether the action is still relevant for time-sensitive messages:

public async Task Handle(SendTimeSensitiveAlertCommand command)
{
    if (command.RelevantUntil < DateTimeOffset.UtcNow)
    {
        _logger.Information("Alert {AlertId} is no longer relevant, skipping", command.AlertId);
        return; // Acknowledge and discard
    }

    await _alertService.Send(command.AlertId);
}

Next Steps

Interceptors — centralise error logging across all handlers
Transports — transport-specific retry and DLQ configuration
Testing — test error handling scenarios with the InProcess transport