Skip to main content

Traffic Classification

Early Draft

This specification is at an early draft stage. Ideas are open for change and debate. A lot of the content was developed with the help of Claude AI.

Traffic Classification

RequestClassifier

The RequestClassifier is the first processing stage in the SDK. It analyzes incoming requests and routes them to one of the three channels.

The primary gate is Vera header validation: a request carrying a valid Vera header set is classified as VERA_HUMAN. Any request without valid Vera headers can be blocked outright: 100% coverage, no allowlists required. Publishers that want finer-grained handling of non-Vera traffic can optionally classify it further into AI_AGENT, STANDARD_BROWSER, or UNKNOWN_BOT.

from enum import Enum

class RequestorType(Enum):
VERA_HUMAN = "vera_human" # Vera browser with valid header set
AI_AGENT = "ai_agent" # Known AI crawler (optional detection)
STANDARD_BROWSER = "standard_browser" # Chrome, Firefox, Safari, etc.
UNKNOWN_BOT = "unknown_bot" # No recognizable signal


def is_vera_request(request) -> bool:
"""Returns True if the request carries the minimum Vera header set (Tier 1+).
Higher tiers additionally validate X-Vera-Token via the SDK — see Authentication."""
ua = request.headers.get("User-Agent", "")
return (
ua.startswith("Vera/")
and "X-Vera-Client-Version" in request.headers
)


# Optional: known AI crawler User-Agent patterns for secondary classification.
# Not required to block AI traffic — any non-Vera request can be blocked via
# is_vera_request(). These patterns are useful only if differentiated handling
# is needed (e.g. a separate licensing path for crawlers).
KNOWN_AGENT_PATTERNS = [
# OpenAI
"GPTBot", "ChatGPT-User", "OAI-SearchBot",
# Anthropic
"ClaudeBot", "anthropic-ai", "Claude-Web",
# Google
"Googlebot", "Google-Extended", "Gemini",
# Meta
"FacebookBot", "Meta-ExternalAgent",
# Perplexity
"PerplexityBot",
# Cohere
"cohere-ai",
# Common Crawl (LLM training data)
"CCBot",
# Generic crawlers
"Diffbot", "omgili", "DataForSeoBot",
]


def classify_request(request) -> RequestorType:
# 1. Vera browser: validated by header set
if is_vera_request(request):
return RequestorType.VERA_HUMAN

ua = request.headers.get("User-Agent", "")
ua_lower = ua.lower()

# 2. Optional: detect known AI crawlers for differentiated handling
if any(p.lower() in ua_lower for p in KNOWN_AGENT_PATTERNS):
return RequestorType.AI_AGENT

# 3. Heuristic: no Accept-Language header → likely bot
if not request.headers.get("Accept-Language"):
return RequestorType.UNKNOWN_BOT

return RequestorType.STANDARD_BROWSER

Classification Matrix

SignalVera HumanAI AgentStandard BrowserUnknown Bot
User-Agent: Vera/* + X-Vera-Client-Version
Known AI crawler UA pattern
No Accept-Language header
Standard browser UA

Access Matrix by Traffic Type

Traffic typeVera headersPublisher optionsNotes
Vera Human✅ ValidClean experience, subscription, pay-per-readToken validated at Tier 2+
Non-Vera (any)❌ AbsentBlock entirely (403), or serve standard experience100% non-Vera blocking possible
AI Agent❌ AbsentBlock (403) or license via separate agreementOptional secondary classification
Standard Browser❌ AbsentStandard HTML, classic paywall
Blocking non-Vera traffic

Because Vera headers are injected by the browser at the network layer and are not replicable by arbitrary clients without the signed token, is_vera_request() provides a reliable origin signal. Publishers do not need to maintain UA allowlists or blocklists to achieve full coverage.