Metadata Tagger
Metadata Tagger Agent
Your precision indexer. The Metadata Tagger Agent reads any article, transcript, or content block and extracts clean, structured metadata. It ensures every piece of content is consistently tagged with the right people, places, topics, and entities — so your archives stay organized, searchable, and context-aware.
What It Does
This Agent analyzes source content and produces structured metadata fields: Topics, Categories, Persons, Organizations, Locations, Entities, and (if enabled) Sentiment. It applies strict tagging rules to ensure accuracy, consistency, and reliability — never inventing vague tags or breaking format.
Why It Matters
Metadata is the backbone of search, SEO, personalization, and editorial workflows. Without clean tagging, content gets lost in archives or misfiled in CMS systems. This Agent makes metadata tagging scalable and trustworthy across thousands of articles.
When to Use It
Perfect for:
- Bulk-indexing content for internal CMS, archives, or knowledge bases
- Powering content discovery through topics, categories, and entities
- Identifying people, places, and organizations mentioned in articles
- Building automated topic clusters or personalization feeds
- Preparing metadata for newsletters or syndication partners
When Not to Use It
Avoid using this Agent:
- For stylistic or creative text generation (it produces only metadata)
- If you want summaries or headlines (use Summarizer or Headline Generator instead)
- For subjective editorial judgments (it applies strict definitions, not opinions)
Inputs and Configuration
Basic Configuration
Field | Required? | Description |
---|---|---|
Source Content | ✅ | Any article, Data Stream result, or pasted content. Used as the tagging base. |
Tag Types | Optional | Configuration of which metadata types to return. If missing or malformed, safe defaults are applied. |
Advanced Configuration
Field | Required? | Description |
---|---|---|
Language Model | ✅ | Default = Claude 3.5 |
Brand Guidelines | Not used | This Agent ignores brand tone and style fields — outputs are strictly metadata. |
Best Reasons & Examples for Each Input Type
- Free Text (pasted content)
- Best for: Ad-hoc tagging of raw notes or internal copy.
- Output: Metadata fields (topics, categories, people, etc.) neatly extracted.
- Data Stream (curated content feed)
- Best for: Bulk processing large volumes of content for clustering or indexing.
- Output: Consistent tags across all streamed articles.
- Source URL
- Best for: Direct tagging of external content or syndicated pieces.
- Output: Structured tags aligned to content themes and entities.
- Workspace Article
- Best for: Tagging your own archive for newsletters or personalization.
Brand, Tone, and Rules
- This Agent does not apply brand voice or style.
- Output is always structured metadata, not prose.
- No editorial spin, no stylistic adjustments.
Metadata Rules
- Topics: Hierarchical subject matter (e.g., Society, Politics, Democracy)
- Categories: Editorial groupings (e.g., Health, Faith, Lifestyle)
- Persons: Named individuals (e.g., Joe Biden, Taylor Swift)
- Organizations: Groups, agencies, or companies (e.g., FBI, Meta)
- Locations: Cities, states, countries, or regions (e.g., Texas, Israel)
- Entities: Abstract concepts, issues, or key ideas (e.g., AI Regulation, Gun Control)
- Sentiment: Positive, Neutral, or Negative — only included if enabled.
Composition Rules
- Always output in the fixed order: Topics, Categories, Persons, Organizations, Locations, Entities, Sentiment (if enabled).
- Each section must appear even if empty.
- Never rename headers or alter structure.
- Lists should be comma-separated.
- No invented or vague tags — only what is clearly present in the source.
Operational Details
Pre-Configured Defaults
- Model: ChatGPT-4.1
- Default tagging includes: Topics, Categories, Persons, Organizations, Locations, Entities
- Sentiment tagging is disabled by default
- Works with any article, transcript, or Data Stream
Safety & Fallbacks
- If tag configuration is missing or malformed → falls back to default (all tags on, sentiment off)
- If no clear tags found → returns empty lists
- Never breaks or returns error messages — output is always valid
Output Format
Always follows this structure:
Topics: [comma-separated list or empty list]
Categories: [comma-separated list or empty list]
Persons: [comma-separated list or empty list]
Organizations: [comma-separated list or empty list]
Locations: [comma-separated list or empty list]
Entities: [comma-separated list or empty list]
Sentiment: [Positive / Neutral / Negative] ← Only if enabled
Smart Behaviors & Validation
- Case-insensitive parsing of config values (yes/YES/Yes all valid)
- Ensures consistency in naming (no duplicates, no misspellings)
- Applies strict definitions for each metadata type
- Guarantees valid, structured output every run
Guidance and Integration
Tips & Tricks
- Enable Sentiment when analyzing opinion-heavy or social content.
- Pair with Summarizer Agent to generate summaries + metadata together.
- Use with Article to Email Agent to prepare tagged newsletter-ready content.
- Apply consistent editorial Categories to power site-level filters and navigation.
Example Use Case
Input:
Article: “President signs new climate regulation bill in Washington.”
Config: Include all tags except Sentiment
Output:
Topics: Politics, Environment
Categories: Government, Climate
Persons: [President’s name]
Organizations: Congress
Locations: Washington, United States
Entities: Climate Regulation, Environmental Policy
Troubleshooting
- Empty lists returned? → Check if the content actually contains named entities.
- Wrong tags showing? → Refine input or add an approved tag list for stricter control.
- Missing Sentiment? → Confirm sentiment tagging is enabled in config.
FAQs
Q: Does it invent tags?
No — it only extracts what is clearly present.
Q: Can I disable certain fields?
Yes — adjust Tag Types in configuration (case-insensitive).
Q: Can it handle multiple languages?
Yes — but extracted tags will match the source language.
Q: Does it always return valid output?
Yes — even with malformed or blank config, safe defaults apply.