SaniText is a tool for detecting and removing potentially problematic characters from text. It helps ensure your text only contains characters you want to allow.
Features
Detect Only
Identifies suspicious characters in your text without modifying it. This is useful for checking what characters might be problematic before sanitizing.
Sanitize Text
Processes your text by removing or replacing characters that aren't in the allowed set. By default, only ASCII printable characters are allowed.
Paste from Clipboard
Quickly inserts text from your clipboard into the input area.
Copy to Clipboard
Copies the sanitized output text to your clipboard for easy use elsewhere.
Options
Allow Emoji
When checked, common single code point emoji characters (like 😀, 👍, ❤️) will be preserved during sanitization instead of being removed.
Additional allowed characters
Specify specific characters you want to allow, even if they're not ASCII printable. For example, entering "αñøç" will allow these specific Greek and special Latin characters to remain in your text.
Load allowed characters from file
Upload a text file containing characters you want to allow. Every character in the file will be added to the allowed set. This is useful for handling large sets of special characters.
Interactive Mode
When enabled, SaniText will prompt you for each unique disallowed character, giving you the option to keep it, remove it, or replace it with something else. This gives you fine-grained control over the sanitization process.
Verbose Mode
Shows detailed information about detected suspicious characters, including their Unicode code points and names. This helps you understand what characters were found and how they were handled.
How Sanitization Works
SaniText follows these steps when sanitizing text:
Determines the set of allowed characters based on your options
Identifies all characters in your text that aren't in the allowed set
For each disallowed character:
In interactive mode: Asks you what to do with it
In automatic mode: Tries to replace it with a similar ASCII character if possible, otherwise removes it
Constructs the sanitized text using your decisions or automatic replacements
Character Replacement
When replacing characters, SaniText uses several strategies:
Homoglyph mapping: Characters that look similar to ASCII (like "Α" → "A") are replaced with their ASCII lookalikes
Unicode normalization: Composed characters are decomposed into their base form when possible
Removal: Characters with no good ASCII equivalent are removed
Use Cases
Security: Sanitize user input to prevent homoglyph attacks or injection of invisible/special characters
Data Cleaning: Prepare text data for systems that only support ASCII
Compatibility: Ensure text works in environments with limited character support
Standardization: Convert text with various Unicode characters to a standardized form