Traffic Classification
This specification is at an early draft stage. Ideas are open for change and debate. A lot of the content was developed with the help of Claude AI.
Traffic Classification
RequestClassifier
The RequestClassifier is the first processing stage in the SDK. It analyzes incoming requests and routes them to one of the three channels.
The primary gate is Vera header validation: a request carrying a valid Vera header set is classified as VERA_HUMAN. Any request without valid Vera headers can be blocked outright: 100% coverage, no allowlists required. Publishers that want finer-grained handling of non-Vera traffic can optionally classify it further into AI_AGENT, STANDARD_BROWSER, or UNKNOWN_BOT.
from enum import Enum
class RequestorType(Enum):
VERA_HUMAN = "vera_human" # Vera browser with valid header set
AI_AGENT = "ai_agent" # Known AI crawler (optional detection)
STANDARD_BROWSER = "standard_browser" # Chrome, Firefox, Safari, etc.
UNKNOWN_BOT = "unknown_bot" # No recognizable signal
def is_vera_request(request) -> bool:
"""Returns True if the request carries the minimum Vera header set (Tier 1+).
Higher tiers additionally validate X-Vera-Token via the SDK — see Authentication."""
ua = request.headers.get("User-Agent", "")
return (
ua.startswith("Vera/")
and "X-Vera-Client-Version" in request.headers
)
# Optional: known AI crawler User-Agent patterns for secondary classification.
# Not required to block AI traffic — any non-Vera request can be blocked via
# is_vera_request(). These patterns are useful only if differentiated handling
# is needed (e.g. a separate licensing path for crawlers).
KNOWN_AGENT_PATTERNS = [
# OpenAI
"GPTBot", "ChatGPT-User", "OAI-SearchBot",
# Anthropic
"ClaudeBot", "anthropic-ai", "Claude-Web",
# Google
"Googlebot", "Google-Extended", "Gemini",
# Meta
"FacebookBot", "Meta-ExternalAgent",
# Perplexity
"PerplexityBot",
# Cohere
"cohere-ai",
# Common Crawl (LLM training data)
"CCBot",
# Generic crawlers
"Diffbot", "omgili", "DataForSeoBot",
]
def classify_request(request) -> RequestorType:
# 1. Vera browser: validated by header set
if is_vera_request(request):
return RequestorType.VERA_HUMAN
ua = request.headers.get("User-Agent", "")
ua_lower = ua.lower()
# 2. Optional: detect known AI crawlers for differentiated handling
if any(p.lower() in ua_lower for p in KNOWN_AGENT_PATTERNS):
return RequestorType.AI_AGENT
# 3. Heuristic: no Accept-Language header → likely bot
if not request.headers.get("Accept-Language"):
return RequestorType.UNKNOWN_BOT
return RequestorType.STANDARD_BROWSER
Classification Matrix
| Signal | Vera Human | AI Agent | Standard Browser | Unknown Bot |
|---|---|---|---|---|
User-Agent: Vera/* + X-Vera-Client-Version | ✅ | — | — | — |
| Known AI crawler UA pattern | — | ✅ | — | — |
No Accept-Language header | — | — | — | ✅ |
| Standard browser UA | — | — | ✅ | — |
Access Matrix by Traffic Type
| Traffic type | Vera headers | Publisher options | Notes |
|---|---|---|---|
| Vera Human | ✅ Valid | Clean experience, subscription, pay-per-read | Token validated at Tier 2+ |
| Non-Vera (any) | ❌ Absent | Block entirely (403), or serve standard experience | 100% non-Vera blocking possible |
| AI Agent | ❌ Absent | Block (403) or license via separate agreement | Optional secondary classification |
| Standard Browser | ❌ Absent | Standard HTML, classic paywall | — |
Because Vera headers are injected by the browser at the network layer and are not replicable by arbitrary clients without the signed token, is_vera_request() provides a reliable origin signal. Publishers do not need to maintain UA allowlists or blocklists to achieve full coverage.