Why use a queue instead of synchronous logging?

Queueing isolates log persistence from request execution so requests stay fast and logging failures don't break workflows.

What should be in the logs collection schema?

Store timestamp, level, source, description, serialized data, errorMessage, errorLocation, and tenant to keep logs queryable and safe.

What happens if enqueue fails in production?

Use a non-blocking fallback like console logging and instrument enqueue failure metrics so you can alert and remediate without affecting users.

Payload CMS Logging: Queue-Based Production Best Practices

If you are building anything non-trivial in Payload CMS, logging is one of the first things that looks simple and then starts hurting under real traffic. I ran into this while scaling request and hook-heavy workflows, and the fix was not just code changes. The important part was making the architecture decision first.

This guide is intentionally problem-aware before it is implementation-heavy: why queue-based logging matters in Payload, what goes wrong when you do not do it, and what the architecture should look like before you write a line of code.

Only after that do we implement the queue-based approach using a dedicated logs collection and Payload Jobs.

The Problem Before the Code

Payload makes it easy to write logs directly, but synchronous logging in request paths does not scale well. The symptoms are predictable:

request blocking: API and hook execution time increases because each log write is in-band
data loss under pressure: failures or timeouts during incidents can drop the very logs you need
collection bloat and noise: operational events mixed with domain data become harder to manage and query

In practice, your API response time gets tied to log writes. Hooks feel slower than expected because each error/info payload waits on persistence. During dependency or DB instability, logging itself can fail and create secondary failures exactly when you need telemetry the most.

There is also a schema problem. If logs are mixed into business collections or scattered ad hoc, operational data becomes noisy and hard to query. You lose clean boundaries between domain data and platform observability.

The real issue is architectural: logging is an operational workload, but synchronous logging treats it like inline business logic. That is why this decision needs to be made before implementation details.

Architecture Decision: Synchronous vs Queue-Based Logging

You should explicitly choose between two models.

Synchronous logging is simpler to start with. It is fine for very low traffic or one-off scripts where latency and failure isolation do not matter.

Queue-based logging adds one more moving part, but it separates request execution from log persistence. That tradeoff is usually correct for production Payload systems because it gives you reliability boundaries:

requests and hooks stay fast
logging failures do not break business workflows
log persistence can be retried and monitored independently

For a production setup, the target architecture is:

dedicated logs collection for structured log data
dedicated Jobs queue (logs) for persistence workload
dedicated task (persistLog) that writes queued input to the collection
safe enqueue utility used everywhere instead of legacy synchronous calls

Once this decision is made, implementation becomes straightforward. But before jumping to code, lock in a few design rules.

Non-Negotiable Design Rules

If you adopt queue-based logging, enforce these rules from day one.

First, queueing logs must never break business flow. Logging is important, but it should not be allowed to take down order creation, webhook handling, or admin actions.

Second, every log payload must be normalized before enqueue. That includes bounded string lengths, safe JSON serialization, and stable field names.

Third, use one dedicated queue for logs. Mixing logging with unrelated tasks makes priority and capacity planning harder.

Fourth, design retention early. Logs are high-volume by nature. If you do not define retention windows and archival rules up front, cost and query performance will degrade.

Architecture Checklist Before Coding

Use this checklist before implementation starts:

Have we agreed that synchronous request-path logging is not our production default?
Do we have a dedicated logs collection schema?
Do we have a dedicated Jobs queue for logs?
Do we have a dedicated persistence task contract (input and failure behavior)?
Do we have fallback behavior if enqueue fails?
Do we have retention and queue monitoring defined?

If any answer is "no," resolve it before writing utilities. Now let's implement.

Step 1: Define a Dedicated Logs Collection

Create a collection designed for operational records, not business entities.

// File: src/collections/Logs/index.ts
import { superAdminOrTenantAdminAccess } from '@/access/superAdminOrTenantAdmin';
import type { CollectionConfig } from 'payload';

export const Logs: CollectionConfig = {
  slug: 'logs',
  labels: {
    singular: 'Log',
    plural: 'Logs',
  },
  admin: {
    group: 'System & Logs',
    useAsTitle: 'description',
    defaultColumns: ['timestamp', 'level', 'source', 'description', 'tenant'],
    description: 'Store arbitrary JSON data, error information, and metadata for debugging and auditing purposes.',
    components: {
      Description: '/src/components/payload/custom/CollectionDescription',
    },
    hidden: false,
  },
  access: {
    read: superAdminOrTenantAdminAccess,
    create: superAdminOrTenantAdminAccess,
    update: superAdminOrTenantAdminAccess,
    delete: superAdminOrTenantAdminAccess,
  },
  fields: [
    {
      name: 'timestamp',
      type: 'date',
      label: 'Timestamp',
      required: true,
      defaultValue: () => new Date().toISOString(),
      admin: {
        description: 'When the log entry was created (auto-set)',
        readOnly: true,
      },
    },
    {
      name: 'level',
      type: 'select',
      label: 'Log Level',
      required: true,
      defaultValue: 'info',
      options: [
        { label: 'Debug', value: 'debug' },
        { label: 'Info', value: 'info' },
        { label: 'Warning', value: 'warning' },
        { label: 'Error', value: 'error' },
      ],
      admin: {
        description: 'Severity level of the log entry',
      },
    },
    {
      name: 'source',
      type: 'select',
      label: 'Source',
      required: true,
      defaultValue: 'manual',
      options: [
        { label: 'Webhook', value: 'webhook' },
        { label: 'Hook', value: 'hook' },
        { label: 'API', value: 'api' },
        { label: 'Migration', value: 'migration' },
        { label: 'Manual', value: 'manual' },
      ],
      admin: {
        description: 'Where the log entry originated from',
      },
    },
    {
      name: 'description',
      type: 'textarea',
      label: 'Description',
      admin: {
        description: 'Optional human-readable summary of the log entry',
      },
    },
    {
      name: 'data',
      type: 'json',
      label: 'Data',
      admin: {
        description: 'Arbitrary JSON data dump for debugging (no schema validation)',
      },
    },
    {
      name: 'errorMessage',
      type: 'text',
      label: 'Error Message',
      admin: {
        description: 'Optional error message if this is an error log',
      },
    },
    {
      name: 'errorLocation',
      type: 'text',
      label: 'Error Location',
      admin: {
        description: 'Optional location where error occurred (file:line format)',
      },
    },
  ],
};

This gives you stable, queryable structure for operational events while keeping logs decoupled from business tables. It also establishes a clear contract for what the queue task should persist.

Step 2: Route Log Writes Through the Jobs Queue

Now implement queue-first logging utilities and stop writing logs inline in request/hook paths.

// File: src/utilities/createLog.ts
import { getPayloadClient } from '@/lib/payloadClient';
import type { Payload, PayloadRequest } from 'payload';

export interface LogEntry {
  level: 'debug' | 'info' | 'warning' | 'error';
  source: 'webhook' | 'hook' | 'api' | 'migration' | 'manual';
  description?: string;
  data?: Record<string, any>;
  errorMessage?: string;
  errorLocation?: string;
  tenant: number; // Tenant ID only
}

const LOG_PERSIST_TASK_SLUG = 'persistLog';
const LOG_QUEUE_NAME = 'logs';

function fallbackConsoleLog(logEntry: Partial<LogEntry>, reason: string): void {
  try {
    const timestamp = new Date().toISOString();
    const level = logEntry?.level ?? 'info';
    const source = logEntry?.source ?? 'unknown';
    const description = logEntry?.description ?? 'No description';
    const tenant = logEntry?.tenant ?? 'unknown';

    console.log(
      `[Logging Fallback] [${timestamp}] [${level.toUpperCase()}] [${source}] tenant=${tenant} | ${description} | Reason: ${reason}`
    );

    if (logEntry?.data) {
      try {
        const dataStr = JSON.stringify(logEntry.data);
        const truncated = dataStr.length > 1000 ? dataStr.slice(0, 1000) + '...[truncated]' : dataStr;
        console.log(`[Logging Fallback] Data: ${truncated}`);
      } catch {
        console.log('[Logging Fallback] Data: [unable to serialize]');
      }
    }
  } catch {
    // Do nothing as absolute last resort
  }
}

function serializeForStorage(obj: any): any {
  if (!obj) return obj;

  try {
    const seen = new WeakSet();
    const serialized = JSON.parse(
      JSON.stringify(obj, (key, value) => {
        if (typeof value === 'object' && value !== null) {
          if (seen.has(value)) return '[Circular Reference]';
          seen.add(value);
        }

        if (value instanceof Error) {
          return {
            message: value.message,
            stack: value.stack,
            name: value.name,
          };
        }

        if (value instanceof Date) return value.toISOString();
        return value;
      })
    );
    return serialized;
  } catch (e) {
    console.warn('[Logging] Failed to serialize object, returning safe representation:', e);
    return {
      _serializationError: 'Failed to serialize full object',
      _type: typeof obj,
      _keys: Array.isArray(obj) ? `[${obj.length} items]` : Object.keys(obj || {}).slice(0, 10),
    };
  }
}

function buildSafeLogData(logEntry: LogEntry, safeTenant: number) {
  let serializedData: any;
  try {
    serializedData = logEntry.data ? serializeForStorage(logEntry.data) : undefined;
  } catch {
    serializedData = { _error: 'Failed to serialize data' };
  }

  return {
    timestamp: new Date().toISOString(),
    level: (logEntry.level || 'info') as 'debug' | 'info' | 'warning' | 'error',
    source: (logEntry.source || 'api') as 'webhook' | 'hook' | 'api' | 'migration' | 'manual',
    description: String(logEntry.description || '').slice(0, 10000),
    data: serializedData,
    errorMessage: logEntry.errorMessage ? String(logEntry.errorMessage).slice(0, 5000) : undefined,
    errorLocation: logEntry.errorLocation ? String(logEntry.errorLocation).slice(0, 1000) : undefined,
    tenant: safeTenant,
  };
}

export interface QueueLogResult {
  queued: boolean
}

export type QueueHookLogEntry = Omit<LogEntry, 'source' | 'tenant'>

export async function queueLog(
  req: PayloadRequest | undefined,
  logEntry: LogEntry
): Promise<QueueLogResult> {
  try {
    if (!logEntry || typeof logEntry !== 'object') {
      fallbackConsoleLog({}, 'Invalid logEntry provided for queueing');
      return { queued: false };
    }

    const safeTenant = typeof logEntry.tenant === 'number' && !isNaN(logEntry.tenant)
      ? logEntry.tenant
      : 1;

    let payload: Payload | null = null;

    try {
      if (req?.payload && typeof req.payload.create === 'function') {
        payload = req.payload;
      } else {
        payload = await getPayloadClient();
      }
    } catch (clientErr) {
      fallbackConsoleLog(
        { ...logEntry, tenant: safeTenant },
        `Failed to get Payload client for queueing: ${clientErr instanceof Error ? clientErr.message : String(clientErr)}`
      );
      return { queued: false };
    }

    if (!payload || !payload.jobs || typeof payload.jobs.queue !== 'function') {
      fallbackConsoleLog(
        { ...logEntry, tenant: safeTenant },
        'No valid payload jobs queue available'
      );
      return { queued: false };
    }

    const queueInput = buildSafeLogData({ ...logEntry, tenant: safeTenant }, safeTenant);

    await payload.jobs.queue({
      task: LOG_PERSIST_TASK_SLUG,
      input: queueInput,
      queue: LOG_QUEUE_NAME,
    });

    return { queued: true };
  } catch (err) {
    const errorMsg = err instanceof Error ? err.message : String(err);
    fallbackConsoleLog(logEntry || {}, `Failed to queue log entry: ${errorMsg}`);
    return { queued: false };
  }
}

export async function queueLogHook(
  req: PayloadRequest,
  entry: QueueHookLogEntry
): Promise<void> {
  const result = await queueLog(req, {
    ...entry,
    source: 'hook',
    tenant: 1,
  });

  if (!result.queued) {
    req.payload.logger.error({
      msg: 'Failed to queue hook log',
      description: entry.description,
      errorLocation: entry.errorLocation,
    });
  }
}

This code does four production-critical things. It normalizes input into a stable log schema, serializes unsafe payload data defensively, enqueues logs into a dedicated queue, and never lets logging failures crash the main flow. That is the practical reliability improvement over legacy synchronous logging.

With this in place, every API route and hook can call queueLog or queueLogHook and stay non-blocking.

Step 3: Use Queue Logging in Request/Hook Paths

Keep usage simple and consistent so teams do not drift back to inline writes.

// File: src/app/api/example/route.ts
import { queueLog } from '@/utilities/createLog';

await queueLog(req, {
  level: 'info',
  source: 'api',
  description: 'Order webhook processed',
  data: { orderId: 12345 },
  tenant: 1,
});

// File: src/collections/Orders/hooks/example.ts
import { queueLogHook } from '@/utilities/createLog';

await queueLogHook(req, {
  level: 'error',
  description: 'afterChange failed to enqueue email task',
  errorLocation: 'orders/afterChange.ts:42',
  data: { orderId: doc.id },
});

At this point, your implementation is aligned with the architecture decision: operational telemetry is off the hot path and persisted asynchronously.

Operational Considerations After Deploy

A queue-based design solves request-path strain, but production quality depends on operations.

First, define retention. Logs are operational data, so set policy by value and cost instead of keeping everything forever. If a class of logs has no debugging or audit value after a time window, expire or archive it.

Second, monitor the queue itself. Your logging system is now a pipeline, so backlog depth and processing latency are the key health signals. If queue depth grows faster than workers drain it, your "working" logging setup is already degraded.

Third, plan for backlog behavior. During incident spikes, queue delay will increase. That is expected. What matters is that business requests still succeed and logs eventually persist once pressure drops. This is exactly why queue isolation is worth the extra moving part.

Conclusion

The problem was never just "how to write a log in Payload." The real production problem was coupling log persistence to request execution. The solution is a design choice first: dedicated logs collection plus a dedicated Jobs queue and task for asynchronous persistence.

With this architecture, you can keep request paths fast, preserve logs through transient failures, and operate logging as a system instead of a helper function.

Let me know in the comments if you have questions, and subscribe for more practical development guides.

Thanks, Matija

Payload CMS Logging: Queue-Based Production Best Practices

Need Help Making the Switch?

Related Posts:

The Problem Before the Code

Architecture Decision: Synchronous vs Queue-Based Logging

Non-Negotiable Design Rules

Architecture Checklist Before Coding

Step 1: Define a Dedicated Logs Collection

Step 2: Route Log Writes Through the Jobs Queue

Step 3: Use Queue Logging in Request/Hook Paths

Operational Considerations After Deploy

Conclusion

📚 Comprehensive Payload CMS Guides

Frequently Asked Questions

Comments

You might be interested in

Get in Touch