File Type Detector

Identify a file from its magic bytes — catch mislabeled extensions

Drop any file. The first 4 KB of bytes are inspected to identify the real format.

Max file size: 50MB

Files are processed in your browser. Nothing is uploaded.

How to File Type Detector Online

Drop any file, identify its true format from the magic bytes at the start of the file. Detects mismatches between the extension and the real content.

Drop any file onto the page. JPG, PDF, ZIP, executables, fonts — anything.
The first 4 KB of bytes is read locally. The header bytes (first 32) are shown in hex.
If the bytes match a known format, the detected type(s) appear. If the file extension matches the bytes, you get a green confirmation. If they differ, a yellow warning explains what was detected vs. what the name claims.
Some formats share magic bytes (ZIP / Office / Java / Android all start with `PK`); multiple matches are normal for those.

About File Type Detector

File extensions are convenient and unreliable. The extension is just text appended to a filename — anyone can change it, the OS sometimes guesses wrong, downloads from URLs without extensions land with default ones, file transfers across operating systems re-derive extensions in unpredictable ways. The actual format of a file is determined by its content, specifically the first few bytes which contain a format-defining signature called "magic bytes." This tool reads those magic bytes and tells you what the file really is.

**How magic bytes work**. Every common file format has a fixed-position signature near the beginning of the file. JPEG files start with the bytes FF D8 FF. PNG files start with 89 50 4E 47 0D 0A 1A 0A. PDF files start with the ASCII characters %PDF-. ZIP files (and everything built on ZIP — Office docs, JARs, APKs) start with PK followed by control bytes. ELF executables on Linux start with 7F 45 4C 46. Mach-O executables on macOS start with CE FA ED FE or CA FE BA BE depending on architecture. These signatures are in the format specs and don't depend on the filename. Reading the first 4 KB of any file and matching against a table of known signatures correctly identifies 100+ common formats.

The library powering this tool, **magic-bytes.js**, maintains that table. It's a small TypeScript port of the libmagic database (the same database used by the Linux `file` command). The tool lazy-loads it on the first detection (~20 KB) and runs the match in milliseconds.

**Common use cases**:

- **"I got a file with no extension and I don't know what to open it with."** Drop it, the tool tells you, then you know what app to use. (Sometimes drag-to-app works without knowing the type, but for files that need a specific opener, knowing the type first is faster.) - **"This .jpg file won't open as an image."** Drop it; it might really be a PDF, a video file, or something else. Renaming back to the correct extension usually fixes the issue. - **"I downloaded this file from a sketchy site and want to check what it is."** First-line safety check before opening. If a file claiming to be a .docx really is a .exe, do not open it. - **"I'm building a file-handling system and need to verify uploaded files."** Show users what magic-bytes detection produces for various inputs to test your validation logic. - **"My image file is corrupt and I want to see what's actually in it."** Tools that fail to open the file still leave you in the dark about format. Magic-bytes detection at least tells you what the file claims to be.

**Mismatch types and what they mean**:

- **Mismatch with one extra format detected**: the file is the wrong type. Either someone renamed it deliberately or it was incorrectly named on download. Common: .jpg that's actually .png; .pdf that's actually .docx; .zip that's actually .gz. - **Mismatch with no recognized format**: the file is in an obscure format that magic-bytes doesn't know about, or it's plain text (which has no magic bytes), or it's corrupted. Check the header-bytes hex view for clues. - **Multiple detected formats for one file**: usually a "ZIP-family" file (Office docs, JAR, etc.) where multiple distinct format names share the same magic-byte signature. This is normal.

**Plain text files**: most text files don't have magic bytes (no signature is part of the text-file spec). The detector will return "Format not recognized" for them. To distinguish from a corrupt binary, check the header bytes — text files have printable ASCII at the start, while corrupt binaries have arbitrary bytes.

**Format spoofing**: file format spoofing — where a malicious file is given a benign extension to bypass filters — is a real attack pattern. This tool catches the easy cases (binary file with text extension, executable with image extension). Sophisticated attacks can defeat magic-bytes detection by crafting files that have legitimate magic bytes for one format but contain malicious content payload (e.g., a "polyglot" file that's a valid PDF AND a valid HTML AND contains malicious JavaScript). Magic-bytes detection is the first line, not the last; for real security, sandboxed execution or AV scanning is needed.

**Limits**:

- Doesn't validate that the rest of the file is well-formed. A file with PDF magic bytes followed by random garbage detects as PDF; whether it actually opens as a PDF requires opening it in a PDF viewer. - Doesn't decode embedded metadata. For EXIF data on photos, use the <a href="/tools/exif-viewer">EXIF Viewer</a>. For PDF metadata, use the <a href="/tools/pdf-metadata">PDF Metadata Viewer</a>. - Doesn't detect text encodings. UTF-8 vs UTF-16 vs Latin-1 — these are text-content properties, not magic-byte properties (except UTF-8 with BOM and UTF-16 with BOM, which have signatures).

**Privacy**. The file's first 4 KB is read into your browser's memory via the File API. magic-bytes.js runs the signature match locally. Results display in your tab. Nothing crosses the network. Verify in DevTools — drop any file, watch the network panel stay empty. The remaining 99.9%+ of file bytes are not even read, let alone transmitted.

**Edge cases handled**: files smaller than 4 KB (reads what's available); empty files (reports "Format not recognized"); text files with no magic bytes (reports unknown); files with no extension (declared extension shown as "(none)"); files with multiple legitimate magic-byte matches (all reported); URL.createObjectURL not needed (no preview), so no lifecycle concerns.

Related Tools

File Hash Verifier

Compute MD5/SHA hashes of any file and verify against an expected value

Data URL Encoder

Convert any file to a data: URL and decode back to a file

ZIP Compress / Extract

Bundle files into a ZIP or extract files from a ZIP — both directions in your browser

Frequently Asked Questions

What are 'magic bytes' and why are they more reliable than file extensions?

The first few bytes of most file formats are a signature — a specific byte sequence that identifies the format. JPG files start with FF D8 FF. PNG files start with 89 50 4E 47 0D 0A 1A 0A. PDFs start with %PDF-. ZIP files start with PK followed by specific control bytes. These signatures are part of the format spec and don't depend on the filename. The file extension, on the other hand, is just metadata that anyone can change. A file named photo.jpg might actually be a PDF, a malicious executable, or anything else — the bytes are the truth.

Why might I get multiple detected types?

Because some formats share magic bytes. The most common case: ZIP. The ZIP magic bytes (`PK\x03\x04`) appear at the start of every ZIP archive, but they also appear at the start of every file that is *built on top of* ZIP — Office documents (.docx, .xlsx, .pptx), OpenDocument files (.odt, .ods, .odp), Java archives (.jar), Android APKs, browser extensions (.crx), and ePub books. All of them are technically ZIPs with a specific internal structure. The detector reports all matching signatures; if your .docx shows up as 'ZIP archive + DOCX + JAR', that's correct — it really is all three from a byte-signature perspective.

What does an extension mismatch usually mean?

Three common causes. **Renamed file**: someone changed the extension without converting the content. Common with screenshots, downloads, file transfers where the OS guesses extensions wrong. **Wrong-suffix download**: a file served from a web URL with no extension hint sometimes lands with the wrong default extension. **Intentional renaming for transport**: some email systems block .exe attachments, so attackers rename them to .jpg or .pdf. If you didn't expect a mismatch, treat it as suspicious.

Can I detect malicious files this way?

Partially. The detector tells you what the bytes are, which catches obvious extension-spoofing (renamed executables, mislabeled scripts, etc.). It does NOT do antivirus-style analysis of content for malicious intent. A genuinely-formatted PDF that contains a malicious JavaScript payload still detects as PDF — correctly. For full malware analysis, dedicated tools (Microsoft Defender, ClamAV, VirusTotal) are the right answer. This tool is the first line check.

How many formats does it know?

The magic-bytes.js library detects 100+ common formats including: images (JPG, PNG, GIF, WebP, BMP, TIFF, ICO, HEIC, AVIF), documents (PDF, DOC, DOCX, ODT, RTF), archives (ZIP, RAR, 7Z, GZIP, TAR, BZ2, XZ), media (MP3, MP4, MOV, WEBM, OGG, FLAC, WAV), executables (EXE, ELF, MACH-O), fonts (TTF, OTF, WOFF, WOFF2), and many more. Format spec is at github.com/LarsKoelpin/magic-bytes.

Does the detector read my whole file?

No. It reads only the first 4 KB — plenty for every supported format because magic bytes are at the very start. For a 1 GB file, the detector still works fast because it's only looking at the first 4096 bytes. The remaining content isn't read or transmitted.

Is the file uploaded?

No. The detection runs entirely in your browser. The file is read into your tab's memory via the File API; magic-bytes.js inspects the bytes locally; results display in your tab. Verify in DevTools — drop a file, watch the network panel stay empty.