Architecture
MarkItDown uses a modular converter architecture where each converter is responsible for handling specific file types. All converters inherit from theDocumentConverter base class and implement two key methods:
accepts()- Determines if the converter can handle a given file based on MIME type, extension, or URLconvert()- Performs the actual conversion to Markdown
Base Classes
DocumentConverter
The abstract base class for all converters. Key Methods:DocumentConverterResult
The result object returned by all converters. Properties:markdown(str) - The converted Markdown texttitle(Optional[str]) - Optional document title extracted from the filetext_content(str) - Deprecated alias formarkdown
Available Converters
Document Formats
Convert PDF files with table extraction support
DOCX
Convert Word documents preserving styles and tables
PPTX
Convert PowerPoint presentations with images and charts
XLSX/XLS
Convert Excel spreadsheets to Markdown tables
Media Formats
Images
Extract metadata and generate AI descriptions for images
Audio
Transcribe audio files and extract metadata
Web & Markup
HTML
Convert HTML to clean Markdown
Other Converters
Plain text, CSV, Jupyter notebooks, ZIP archives, EPUB, RSS, Wikipedia, YouTube, Bing search, Outlook messages, and Azure Document Intelligence
Converter Selection
MarkItDown automatically selects the appropriate converter based on:- File Extension - Primary method (e.g.,
.pdf,.docx) - MIME Type - Secondary method from HTTP headers or file detection
- URL Pattern - For special web converters (Wikipedia, YouTube, Bing)
- Content Inspection - Some converters peek at file content to confirm format
Common Options
Many converters accept these optional parameters via**kwargs:
OpenAI-compatible client for AI-powered features (image captioning, etc.)
Model name to use with the LLM client (e.g., “gpt-4o”)
Custom prompt for LLM operations
Path to exiftool binary for metadata extraction
Embed images as base64 data URIs instead of file references
Error Handling
Converters may raise these exceptions:MissingDependencyException- Required dependency not installedFileConversionException- Conversion failed for a supported file typeUnsupportedFormatException- No converter available for the file type