Character Encoding Detector

Character Encoding Detection: Core Features & Principles

When you open a text file or webpage and see gibberish (mojibake), it is usually because the system decoded it using the wrong encoding format. This tool accurately identifies common encoding types like UTF-8 and GBK by analyzing the text's byte patterns. Character encoding is a system of rules that maps characters to numbers that computers can store; different encodings can interpret the exact same byte sequence in completely different ways.

Why Use Our Character Encoding Detector?

Supports detection for 10+ mainstream encodings, including UTF-8/16/32, GB series, Big5, and other Asian encodings.
Analyzes text byte characteristics in real-time without relying on file metadata.
Provides a confidence score to help you evaluate the reliability of the detection results.

How to Use

Paste the gibberish text into the input box.
Click the "Detect Encoding" button.
View the system-identified encoding type and its confidence score.

Frequently Asked Questions (FAQ)

Why are there multiple possible encodings in the results?
This happens because different encodings can overlap in how they interpret certain byte sequences. The tool displays all possible encodings sorted by confidence score.

How do I fix the "锟斤拷" (replacement character) mojibake?
This is a classic case of GBK encoding being misread as UTF-8. You should use this tool to confirm the actual encoding, then reopen the file using the correct encoding.

Important Notes

Detection accuracy may be lower for short texts (under 50 characters). Binary files cannot be detected for text encoding. For mixed-encoding text, only the primary encoding type can be identified.

Technical Notes & Best Practices

We highly recommend always using UTF-8 encoding in development. A typical example: the Chinese characters "你好" (Hello) take up 2 bytes per character in GBK (0xC4E3 0xBAC3) and 3 bytes per character in UTF-8 (0xE4BDA0 0xE5A5BD). You can often make a preliminary guess about the encoding type based on these byte length differences.