Metadata Tagger

Your precision indexer. The Metadata Tagger Agent reads any article, transcript, or content block and extracts clean, structured metadata. It ensures every piece of content is consistently tagged with the right people, places, topics, and entities — so your archives stay organized, searchable, and context-aware.

Metadata Tagger Agent

What It Does

This Agent analyzes source content and produces structured metadata fields: Topics, Categories, Persons, Organizations, Locations, Entities, and (if enabled) Sentiment. It applies strict tagging rules to ensure accuracy, consistency, and reliability — never inventing vague tags or breaking format.

Why It Matters

Metadata is the backbone of search, SEO, personalization, and editorial workflows. Without clean tagging, content gets lost in archives or misfiled in CMS systems. This Agent makes metadata tagging scalable and trustworthy across thousands of articles.

When to Use It

Perfect for:

Bulk-indexing content for internal CMS, archives, or knowledge bases
Powering content discovery through topics, categories, and entities
Identifying people, places, and organizations mentioned in articles
Building automated topic clusters or personalization feeds
Preparing metadata for newsletters or syndication partners

When Not to Use It

Avoid using this Agent:

For stylistic or creative text generation (it produces only metadata)
If you want summaries or headlines (use Summarizer or Headline Generator instead)
For subjective editorial judgments (it applies strict definitions, not opinions)

Inputs and Configuration

Basic Configuration

Field	Required?	Description
Source Content	✅	Any article, Data Stream result, or pasted content. Used as the tagging base.
Tag Types	Optional	Configuration of which metadata types to return. If missing or malformed, safe defaults are applied.

Advanced Configuration

Field	Required?	Description
Language Model	✅	Default = Claude 3.5
Brand Guidelines	Not used	This Agent ignores brand tone and style fields — outputs are strictly metadata.

Best Reasons & Examples for Each Input Type

Free Text (pasted content)
- Best for: Ad-hoc tagging of raw notes or internal copy.
- Output: Metadata fields (topics, categories, people, etc.) neatly extracted.
Data Stream (curated content feed)
- Best for: Bulk processing large volumes of content for clustering or indexing.
- Output: Consistent tags across all streamed articles.
Source URL
- Best for: Direct tagging of external content or syndicated pieces.
- Output: Structured tags aligned to content themes and entities.
Workspace Article
- Best for: Tagging your own archive for newsletters or personalization.

Brand, Tone, and Rules

This Agent does not apply brand voice or style.
Output is always structured metadata, not prose.
No editorial spin, no stylistic adjustments.

Metadata Rules

Topics: Hierarchical subject matter (e.g., Society, Politics, Democracy)
Categories: Editorial groupings (e.g., Health, Faith, Lifestyle)
Persons: Named individuals (e.g., Joe Biden, Taylor Swift)
Organizations: Groups, agencies, or companies (e.g., FBI, Meta)
Locations: Cities, states, countries, or regions (e.g., Texas, Israel)
Entities: Abstract concepts, issues, or key ideas (e.g., AI Regulation, Gun Control)
Sentiment: Positive, Neutral, or Negative — only included if enabled.

Composition Rules

Always output in the fixed order: Topics, Categories, Persons, Organizations, Locations, Entities, Sentiment (if enabled).
Each section must appear even if empty.
Never rename headers or alter structure.
Lists should be comma-separated.
No invented or vague tags — only what is clearly present in the source.

Operational Details

Pre-Configured Defaults

Model: ChatGPT-4.1
Default tagging includes: Topics, Categories, Persons, Organizations, Locations, Entities
Sentiment tagging is disabled by default
Works with any article, transcript, or Data Stream

Safety & Fallbacks

If tag configuration is missing or malformed → falls back to default (all tags on, sentiment off)
If no clear tags found → returns empty lists
Never breaks or returns error messages — output is always valid

Output Format

Always follows this structure:

Topics: [comma-separated list or empty list]  
Categories: [comma-separated list or empty list]  
Persons: [comma-separated list or empty list]  
Organizations: [comma-separated list or empty list]  
Locations: [comma-separated list or empty list]  
Entities: [comma-separated list or empty list]  
Sentiment: [Positive / Neutral / Negative] ← Only if enabled

Smart Behaviors & Validation

Case-insensitive parsing of config values (yes/YES/Yes all valid)
Ensures consistency in naming (no duplicates, no misspellings)
Applies strict definitions for each metadata type
Guarantees valid, structured output every run

Guidance and Integration

Tips & Tricks

Enable Sentiment when analyzing opinion-heavy or social content.
Pair with Summarizer Agent to generate summaries + metadata together.
Use with Article to Email Agent to prepare tagged newsletter-ready content.
Apply consistent editorial Categories to power site-level filters and navigation.

Example Use Case

Input:
Article: “President signs new climate regulation bill in Washington.”
Config: Include all tags except Sentiment

Output:

Topics: Politics, Environment  
Categories: Government, Climate  
Persons: [President’s name]  
Organizations: Congress  
Locations: Washington, United States  
Entities: Climate Regulation, Environmental Policy

Troubleshooting

Empty lists returned? → Check if the content actually contains named entities.
Wrong tags showing? → Refine input or add an approved tag list for stricter control.
Missing Sentiment? → Confirm sentiment tagging is enabled in config.

FAQs

Q: Does it invent tags?
No — it only extracts what is clearly present.

Q: Can I disable certain fields?
Yes — adjust Tag Types in configuration (case-insensitive).

Q: Can it handle multiple languages?
Yes — but extracted tags will match the source language.

Q: Does it always return valid output?
Yes — even with malformed or blank config, safe defaults apply.