---
title: "Mastering llms.txt: Advanced Next.js 15 Implementation"
slug: "llms-txt-advanced-nextjs-implementation"
published: "2025-10-21"
updated: "2025-12-25"
categories:
  - "Next.js"
tags:
  - "llms.txt"
  - "Next.js 15"
  - "Sanity CMS"
  - "structured metadata"
  - "SEO automation"
  - "web development"
  - "metadata management"
  - "RAG systems"
llm-intent: "reference"
audience-level: "intermediate"
framework-versions:
  - "next.js"
  - "sanity cms"
  - "typescript"
status: "stable"
llm-purpose: "Explore advanced llms.txt implementation in Next.js 15 and Sanity CMS. Gain insights on structured metadata and automation. Read now!"
llm-prereqs:
  - "Access to Next.js"
  - "Access to Sanity CMS"
  - "Access to TypeScript"
llm-outputs:
  - "Completed outcome: Explore advanced llms.txt implementation in Next.js 15 and Sanity CMS. Gain insights on structured metadata and automation. Read now!"
---

**Summary Triples**
- (Sanity schema, must include, LLM-focused fields (llmIntent, goal, difficulty, prereqs, outputs, machineSummary))
- (Markdown export endpoint, should produce, enriched Markdown with machine-friendly frontmatter for each post)
- (llms.txt manifest, should be, machine-readable and discoverable at /llms.txt listing manifests or per-article metadata endpoints)
- (Backfill tool, automates, validation and generation of missing LLM metadata using the Sanity API and optionally an LLM)
- (RAG systems, consume, structured manifests and enriched Markdown for better indexing and retrieval)
- (CI workflow, schedules, regular regeneration and publishing of llms.txt and per-article manifests)
- (Authors, should edit, LLM intent and goals in the CMS to avoid manual post-editing)
- (Exported frontmatter, must include, fields required by RAG (title, slug, publishedAt, llmIntent, machineSummary, prereqs, outputs))
- (Validation script, flags, incomplete records and optionally writes back generated metadata to Sanity)
- (Production rollout, requires, synchronization of robots/sitemap, llms.txt, and per-article manifests to avoid crawler mismatch)

### {GOAL}
Explore advanced llms.txt implementation in Next.js 15 and Sanity CMS. Gain insights on structured metadata and automation. Read now!

### {PREREQS}
- Access to Next.js
- Access to Sanity CMS
- Access to TypeScript

### {STEPS}
1. Enhance Sanity with LLM Metadata
2. Integrate New Metadata into Cache Layer
3. Build Advanced Markdown Documents
4. Publish JSON Specification and Manifests
5. Automate Metadata Validation

<!-- llm:goal="Explore advanced llms.txt implementation in Next.js 15 and Sanity CMS. Gain insights on structured metadata and automation. Read now!" -->
<!-- llm:prereq="Access to Next.js" -->
<!-- llm:prereq="Access to Sanity CMS" -->
<!-- llm:prereq="Access to TypeScript" -->
<!-- llm:output="Completed outcome: Explore advanced llms.txt implementation in Next.js 15 and Sanity CMS. Gain insights on structured metadata and automation. Read now!" -->

# Mastering llms.txt: Advanced Next.js 15 Implementation
> Explore advanced llms.txt implementation in Next.js 15 and Sanity CMS. Gain insights on structured metadata and automation. Read now!
Matija Žiberna · 2025-10-21

Last month I shared why llms.txt matters in [llms.txt Blueprint: Give AI Crawlers Instant Access](/llms-txt-why-it-matters) and walked through the baseline pipeline in [Implementing llms.txt in Next.js 15 with Sanity CMS](/llms-txt-nextjs-sanity-implementation). That foundation gave every article a Markdown twin, a discoverable llms.txt manifest, and sitemap/robots alignment. After shipping it in production I still ran into a familiar problem: LLM crawlers could find my Markdown, but they had no structured metadata, no machine index, and no automated way to stay fresh. This follow-up is the second phase—everything I added to make the pipeline resilient, self-updating, and consumable by RAG systems without manual intervention.

I’ll assume you already have the first phase in place (Sanity posts with `markdownContent`, cached helpers, `/blog/md/[slug]`, and `/llms.txt`). Now we’re going deeper: structured LLM metadata in Sanity, richer Markdown exports, machine-readable manifests, and tooling that validates or backfills the data automatically.

## Extend Sanity with LLM-Focused Metadata

The first upgrade lives in the CMS. I needed every post to declare its intent, goal, difficulty, prereqs, outputs, and machine-friendly summaries. Instead of hard-coding that later, I pushed it into the schema so authors can manage it where the content lives.

```typescript
// File: src/lib/sanity/schemaTypes/postType.ts
defineField({
  name: 'llmIntent',
  title: 'Primary LLM Intent',
  type: 'string',
  options: {
    list: [
      { title: 'How-To', value: 'how-to' },
      { title: 'Reference', value: 'reference' },
      { title: 'Case Study', value: 'case-study' },
      { title: 'Strategy', value: 'strategy' },
      { title: 'Release Notes', value: 'release-notes' },
      { title: 'Troubleshooting', value: 'troubleshooting' },
    ],
  },
  description: 'Classify the article so LLM agents understand the content shape.',
}),
defineField({
  name: 'llmSummaryTriples',
  title: 'LLM Summary Triples',
  type: 'array',
  of: [{
    type: 'object',
    fields: [
      defineField({ name: 'subject', type: 'string', validation: (Rule) => Rule.required() }),
      defineField({ name: 'predicate', type: 'string', validation: (Rule) => Rule.required() }),
      defineField({ name: 'object', type: 'string', validation: (Rule) => Rule.required() }),
    ],
  }],
  description: 'Structured key facts in (subject, predicate, object) form for deterministic extraction.',
}),
defineField({
  name: 'llmApiPrompts',
  title: 'LLM API Prompts',
  type: 'array',
  of: [{
    type: 'object',
    fields: [
      defineField({ name: 'question', type: 'text', rows: 2, validation: (Rule) => Rule.required() }),
      defineField({ name: 'answer', type: 'text', rows: 4, validation: (Rule) => Rule.required() }),
      defineField({ name: 'confidence', type: 'number', validation: (Rule) => Rule.min(0).max(1) }),
    ],
  }],
  description: 'Pre-baked Q/A snippets agents can return when queries align with this post.',
}),
```

I repeated the pattern for `audienceLevel`, `frameworkVersions`, `contentStatus`, `validatedAt`, `llmGoal`, `llmPrerequisites`, and `llmOutputs`. Each field has validation rules and editor-facing descriptions so authors know how to fill them out. After running `pnpm generate:types`, every query is type-safe and ready for consumption.

## Push Metadata Through the Cache Layer

With new schema fields in place, the Sanity cache helpers needed to expose them. Expanding the existing query keeps downstream routes and manifests in sync with a single fetch.

```typescript
// File: src/lib/sanity/queries/queries.ts
export const MARKDOWN_POSTS_QUERY = defineQuery(`*[_type == "post" && defined(slug.current)] | order(publishedAt desc) {
  _id,
  title,
  slug,
  publishedAt,
  dateModified,
  _updatedAt,
  excerpt,
  keywords,
  audienceLevel,
  frameworkVersions,
  contentStatus,
  validatedAt,
  llmIntent,
  llmGoal,
  llmPrerequisites,
  llmOutputs,
  llmSummaryTriples[]{
    subject,
    predicate,
    object
  },
  llmApiPrompts[]{
    question,
    answer,
    confidence
  },
  "hasMarkdown": defined(markdownContent) && markdownContent != "",
  "categories": categories[]->{
    title,
    slug,
    llmLabel,
    llmDescription
  },
  "primaryCategory": categories[0]->{
    title,
    slug,
    llmLabel,
    llmDescription
  },
  steps[]{
    name,
    text
  }
}`)
```

The cached helper in `src/lib/sanity/post-cache.ts` now returns `MarkdownPostSummary` objects with every LLM field, so any route can call `getMarkdownPosts()` and get the enriched dataset from cache instead of hitting Sanity repeatedly.

## Emit Rich Front Matter and Machine Tags in Markdown

Phase one simply echoed the markdown body. The upgraded builder creates a complete document for agents—YAML front matter, summary triples, goal/prereq sections, and a JSON payload for Q/A snippets.

```typescript
// File: src/lib/llm/markdown.ts
export function buildMarkdownDocument(post: MarkdownReadyPost): string {
  const frontMatter = buildYamlFrontMatter({
    title,
    slug,
    published: publishedDate,
    updated: updatedDate,
    validated: validatedDate,
    categories: categoryLabels,
    tags: keywords,
    'llm-intent': intent,
    'audience-level': audienceLevel,
    'framework-versions': frameworkVersions,
    status: post.contentStatus,
    'llm-purpose': goalStatement,
    'llm-prereqs': prerequisites,
    'llm-outputs': llmOutputs,
  })

  lines.push(frontMatter, '')
  lines.push('**Summary Triples**')
  summaryTriples.forEach((triple) => lines.push(triple))
  lines.push('', '### {GOAL}', effectiveGoal, '', '### {PREREQS}')
  prerequisites.length
    ? prerequisites.forEach((item) => lines.push(`- ${sanitizeSingleLine(item)}`))
    : lines.push('- Familiarity with the concepts discussed in this article.')
  lines.push('', '### {STEPS}')
  stepLines.forEach((entry) => lines.push(entry))
  lines.push('')
  machineTags.forEach((tag) => lines.push(tag))
  lines.push('', `# ${title}`)
  // …rest of body and LLM response snippet…
}
```

The route at `src/app/(non-intl)/blog/md/[slug]/route.ts` now imports this builder and returns a fully annotated markdown file. Crawlers get context, triples, machine tags, and a JSON snippet without post-processing, and human readers still see the original Markdown body.

## Publish Machine-Readable Manifests and Corpus Dumps

LLM ingest pipelines love having a single place to pull structured entries. I added two new routes that piggyback on the same cache helper.

```typescript
// File: src/app/(non-intl)/blog/md/index.json/route.ts
export async function GET() {
  const posts = await getMarkdownPosts()

  const manifest = posts.map((post) => ({
    slug,
    title,
    url: `${siteOrigin}/blog/md/${slug}`,
    publishedAt,
    updatedAt,
    validatedAt,
    status: post.contentStatus,
    intent: post.llmIntent,
    audienceLevel: post.audienceLevel ?? post.difficulty,
    goal: post.llmGoal,
    categories,
    tags,
    frameworkVersions,
    prerequisites,
    outputs,
  }))

  return Response.json(manifest, {
    headers: { 'Cache-Control': `public, max-age=0, s-maxage=${revalidate}` },
  })
}
```

```typescript
// File: src/app/(non-intl)/llm/corpus.ndjson/route.ts
export async function GET() {
  const summaries = await getMarkdownPosts()
  const records: string[] = []

  for (const summary of summaries) {
    const slug = resolveSlugValue(summary.slug)
    if (!slug) continue

    const post = await getPostBySlug(slug)
    if (!post?.markdownContent) continue

    const markdown = buildMarkdownDocument(post as MarkdownReadyPost)

    records.push(JSON.stringify({
      slug,
      title: post.title,
      url: `${siteOrigin}/blog/md/${slug}`,
      intent: post.llmIntent,
      audienceLevel: post.audienceLevel ?? post.difficulty,
      status: post.contentStatus,
      publishedAt: formatIsoDate(post.publishedAt),
      updatedAt: formatIsoDate(post.dateModified ?? post._updatedAt ?? post.publishedAt),
      validatedAt: formatIsoDate(post.validatedAt),
      categories,
      tags,
      frameworkVersions,
      prerequisites,
      outputs,
      goal: post.llmGoal,
      summaryTriples,
      responses: prompts,
      body: markdown,
    }))
  }

  return new Response(records.join('\n'), {
    headers: {
      'Content-Type': 'application/x-ndjson; charset=utf-8',
      'Cache-Control': `public, max-age=0, s-maxage=${revalidate}`,
    },
  })
}
```

The first route gives you a compact JSON manifest, the second streams the full NDJSON corpus with Markdown included. Both are statically cached, so crawlers can slurp everything with a single request.

## Expose Structured Service Specs

Product pages needed the same treatment. Instead of hardcoding pricing or model details, I centralized them and published JSON specs alongside the human pages.

```typescript
// File: src/data/service-specs.ts
export const serviceSpecs = [
  {
    slug: 'web-app-development',
    title: 'Productized Web App Development',
    url: 'https://buildwithmatija.com/services/web-app-development',
    specPath: '/services/web-app-development/spec.json',
    summary: 'Fractional CTO partnership to scope, build, and operate complex web applications.',
    pricingModel: 'retainer',
    pricingNotes: 'Monthly partnership starting at €4.5k with minimum 4-week engagement.',
    engagementModel: 'Hands-on fractional CTO and engineering lead delivering sprint-based outcomes.',
    sla: 'Weekly roadmap reviews, 1 business day response times, production hotfix within 12 hours.',
    useCases: ['Ship investor-ready MVPs with production-quality foundations', 'Stabilize or refactor aging Next.js/Node stacks', 'Automate internal workflows with custom portals and APIs'],
    deliverables: ['Technical architecture & deployment plan', 'Production-ready Next.js/TypeScript implementation', 'CI/CD automation with observability hooks', 'Operational handbook for handover'],
    techStack: ['Next.js 15', 'TypeScript', 'Prisma', 'PostgreSQL', 'Vercel', 'Sanity CMS'],
    contactCta: 'https://buildwithmatija.com/contact',
  },
  // …other services…
]
```

Each spec route simply wraps that payload with cache headers (`src/app/(non-intl)/services/[slug]/spec.json/route.ts` and `src/app/(non-intl)/mvp/spec.json/route.ts`). The main landing page pulls from the same dataset, so the human view and machine endpoint share one source of truth.

## Let Crawlers Know About the New Endpoints

`robots.ts` now whitelists the manifest, corpus, and spec routes for both generic bots and popular LLM crawlers.

```typescript
// File: src/app/robots.ts
const llmAgents = ['GPTBot', 'ClaudeBot', 'anthropic-ai', 'PerplexityBot', 'Googlebot']

return {
  sitemap: `${baseUrl}/sitemap.xml`,
  rules: [
    {
      userAgent: '*',
      allow: [
        '/',
        '/llms.txt',
        '/blog/md/index.json',
        '/llm/corpus.ndjson',
        '/services/web-app-development/spec.json',
        '/services/seo-friendly-websites/spec.json',
        '/services/single-purpose-tools/spec.json',
        '/mvp/spec.json',
      ],
      disallow: ['/studio', '/api/', '/wp-admin'],
    },
    ...llmAgents.map((userAgent) => ({
      userAgent,
      allow: [
        '/',
        '/llms.txt',
        '/blog/md/index.json',
        '/llm/corpus.ndjson',
        '/services/web-app-development/spec.json',
        '/services/seo-friendly-websites/spec.json',
        '/services/single-purpose-tools/spec.json',
        '/mvp/spec.json',
      ],
    })),
  ],
}
```

Between this and the upgraded sitemap, every machine entry point is obvious: llms.txt, the JSON manifest, the NDJSON corpus, and the service specs.

## Automate Backfill and Validation

The last mile was operational. Structured fields only help if they’re always filled in, so I added two scripts.

The validator fails the build when a post is missing intent, goal, triples, prompts, framework versions, prerequisites, or outputs.

```typescript
// File: scripts/validate-llm-content.ts
const records = await client.fetch<MarkdownPostSummary[]>(MARKDOWN_POSTS_QUERY)
records.forEach((post) => {
  const slug = resolveSlugValue(post.slug) ?? '(missing-slug)'

  if (!ensureString(post.llmIntent)) issues.push({ slug, message: 'llmIntent is missing.' })
  if (!ensureString(post.llmGoal)) issues.push({ slug, message: 'llmGoal is missing.' })
  if (!ensureString(post.contentStatus)) issues.push({ slug, message: 'contentStatus is missing.' })
  if (!ensureString(post.validatedAt)) issues.push({ slug, message: 'validatedAt is missing.' })

  if (!Array.isArray(post.llmPrerequisites) || post.llmPrerequisites.length === 0)
    issues.push({ slug, message: 'llmPrerequisites list is empty.' })
  if (!Array.isArray(post.frameworkVersions) || post.frameworkVersions.length === 0)
    issues.push({ slug, message: 'frameworkVersions list is empty.' })
  if (!Array.isArray(post.llmSummaryTriples) || post.llmSummaryTriples.length === 0)
    issues.push({ slug, message: 'Missing llmSummaryTriples entries.' })
  if (!Array.isArray(post.llmApiPrompts) || post.llmApiPrompts.length === 0)
    issues.push({ slug, message: 'Missing llmApiPrompts entries.' })
})
```

`pnpm validate:llm` runs it with `tsx`, so CI or local builds fail fast when something’s missing.

For gaps, the backfill script uses GPT-5-mini to generate the metadata. You can run it in dry mode or force regeneration when you want fresh triples.

```bash
pnpm backfill:llm --dry-run --force --slug migrate-docker-containers-between-vps
pnpm backfill:llm --limit 10
pnpm backfill:llm
```

For each post it sends a truncated summary and gets back intent, goal, frameworks, prerequisites, outputs, triples, and Q/A. It also auto-adds `_key` values so Sanity Studio can edit the arrays immediately. If the model leaves anything blank, the script falls back to existing metadata or inferred defaults (tools become frameworks, goal surfaces as fallback output, difficulty maps to audience level). Combined with the validator, this keeps the whole corpus consistent without a weekly manual audit.

## Wrap It Up with Documentation

The `src/app/llms.txt/README.md` file now documents the whole pipeline—Sanity schema fields, manifest routes, NDJSON corpus, backfill command, and validation checklist. Having historical memory in the repo helps new contributors understand why each piece exists and how to extend it safely.

## Conclusion

The first phase gave us Markdown mirrors and llms.txt; the second phase makes the entire pipeline structured, discoverable, and self-maintaining. Sanity stores the metadata, Next.js renders richer Markdown, manifests expose machine-friendly snapshots, service specs have JSON siblings, robots/sitemaps broadcast the endpoints, and tooling backfills or validates everything automatically. By the end of this guide you can let crawlers grab a coherent corpus—Markdown front matter, triples, JSON snippets, even pricing data for services—without scraping or manual exports.

Let me know in the comments if you have questions, and subscribe for more practical development guides.

Thanks,
Matija

## LLM Response Snippet
```json
{
  "goal": "Explore advanced llms.txt implementation in Next.js 15 and Sanity CMS. Gain insights on structured metadata and automation. Read now!",
  "responses": [
    {
      "question": "What does the article \"Mastering llms.txt: Advanced Next.js 15 Implementation\" cover?",
      "answer": "Explore advanced llms.txt implementation in Next.js 15 and Sanity CMS. Gain insights on structured metadata and automation. Read now!"
    }
  ]
}
```