flowCreate.solutions

XSS Prevention

This document details the Cross-Site Scripting (XSS) prevention implementation, including HTML sanitization, field validators, and max length enforcement standards.

Overview

XSS prevention is not “sanitize every string”. A professional baseline is:

  • Validate inputs by type/format/length (reject invalid values).
  • Sanitize only where you intentionally accept HTML (e.g., rich-text fields).
  • Never mutate credentials (e.g., passwords). Prevent XSS by not reflecting/logging secrets and by escaping output where appropriate.

Minimum requirements:

  • Max length validation on every user-controlled string field.
  • Allowlist validation for identifiers and other constrained strings (reject on mismatch).
  • HTML sanitization only for fields that are designed to store/render HTML.

Core Implementation

sanitize_html() Function

Standard Location: utils/security.py

Purpose: Remove dangerous HTML/JavaScript while preserving safe content

Features:

  • Removes dangerous tags: <script>, <iframe>, <object>, <embed>
  • Removes event handlers: onclick, onerror, onload, etc.
  • Removes javascript: protocol
  • Preserves safe HTML tags for rich text
  • Alphanumeric IDs pass through unchanged

Standard Implementation:

import bleach

def sanitize_html(value: str) -> str:
    """
    Sanitize HTML content while preserving safe formatting.
    Uses bleach library.
    """
    if not value:
        return value
    
    # Allow safe HTML tags
    allowed_tags = ['p', 'br', 'strong', 'em', 'u', 'a', 'ul', 'ol', 'li', 'h1', 'h2', 'h3']
    allowed_attributes = {'a': ['href', 'title']}
    allowed_protocols = ['http', 'https', 'mailto']
    
    # Sanitize using bleach
    clean_value = bleach.clean(
        value,
        tags=allowed_tags,
        attributes=allowed_attributes,
        protocols=allowed_protocols,
        strip=True
    )
    
    return clean_value

What Gets Sanitized

Removed:

  • <script> tags and JavaScript
  • <iframe> embeddings
  • Event handlers (onclick, onerror, etc.)
  • javascript: and data: protocols
  • eval() and similar functions

Preserved:

  • Regular text
  • Safe HTML formatting (<p>, <strong>, <em>)
  • Links with sanitized URLs
  • Special characters in passwords (!, @, #) (passwords should not be sanitized; see below)

Schema Implementation Patterns

Basic Pattern

from pydantic import BaseModel, Field, field_validator
from utils.security import sanitize_html

class EntityCreate(BaseModel):
    name: str = Field(..., max_length=100)
    description: str = Field(..., max_length=500)
    
    # Only sanitize fields that are intended to accept HTML/rich text.
    _sanitize = field_validator('name', 'description')(
        lambda cls, v: sanitize_html(v)
    )

Identifier validation (IDs): allowlist + reject (do not sanitize)

Identifiers are not “rich text”. Treat them as constrained inputs:

  • Allowlist characters
  • Reject invalid values (422)
  • Keep max lengths small and consistent

Recommended ID rule (example):

  • length: 1–100
  • chars: alphanumeric plus _ and -
import re

from pydantic import BaseModel, Field, field_validator

ID_RE = re.compile(r"^[a-zA-Z0-9_-]{1,100}$")

def validate_id(value: str) -> str:
    if not value:
        raise ValueError("ID is required")
    if not ID_RE.fullmatch(value):
        raise ValueError("Invalid ID format")
    return value

class EntityGet(BaseModel):
    entity_id: str = Field(..., max_length=100)

    _validate_ids = field_validator("entity_id")(lambda cls, v: validate_id(v))

List Fields

@field_validator('tags', mode='before')
@classmethod
def sanitize_tags(cls, v):
    if v and isinstance(v, list):
        return [sanitize_html(item)[:50] for item in v if isinstance(item, str)]
    return v

Nested Structures

@field_validator('metadata', mode='before')
@classmethod
def sanitize_metadata(cls, v):
    if not v or not isinstance(v, dict):
        return v
    return {
        k: sanitize_html(val)[:5000] if isinstance(val, str) else val 
        for k, val in v.items()
    }

Max Length Validation

Every string field has a max length to prevent payload abuse:

Field Type Max Length Example
ID Fields 100 chars user_id, org_id
Passwords 8-128 chars User passwords
Names/Titles 100-300 chars Entity names
Descriptions 500-1000 chars Short descriptions
Long Text 2000-5000 chars Messages, content
Rich Text 10000 chars HTML content
URLs 500-1000 chars Web addresses

Password fields: never sanitize (do not mutate credentials)

Passwords (and other secrets) must not be sanitized or transformed.

  • Validate length (and optionally basic character constraints if you have a requirement).
  • Never include passwords in error messages.
  • Never log passwords.
  • Hash and store using a strong password hashing scheme (e.g., bcrypt via passlib).

Implementation Checklist

When adding new input schemas:

  • Add max_length to all user-controlled string fields
  • For ID fields: add allowlist validation (reject invalid values; do not sanitize)
  • For HTML/rich-text fields only: add field_validator with sanitize_html()
  • Sanitize list items individually
  • Sanitize nested dict values
  • Test with XSS payloads
  • Test max length enforcement
  • Verify normal content preserved

Example: Complete Schema

from pydantic import BaseModel, Field, field_validator
from utils.security import sanitize_html
from typing import List, Optional

class ProductCreate(BaseModel):
    """Schema for creating a product with XSS prevention."""
    
    # Required fields with validation
    name: str = Field(..., max_length=100)
    description: str = Field(..., max_length=1000)
    price: float = Field(..., gt=0)
    
    # Optional fields
    category: Optional[str] = Field(None, max_length=50)
    tags: Optional[List[str]] = []
    image_url: Optional[str] = Field(None, max_length=500)
    
    # Sanitize simple fields
    _sanitize_strings = field_validator('name', 'description', 'category', 'image_url')(
        lambda cls, v: sanitize_html(v) if v else v
    )
    
    # Sanitize list fields
    @field_validator('tags', mode='before')
    @classmethod
    def sanitize_tags(cls, v):
        if v and isinstance(v, list):
            return [sanitize_html(item)[:50] for item in v if isinstance(item, str)]
        return v