Skip to main content
The MarkItDown class is the primary interface for converting various document formats to Markdown. It manages converter registration, file type detection, and the conversion process.

Constructor

MarkItDown(
    *,
    enable_builtins: Union[None, bool] = None,
    enable_plugins: Union[None, bool] = None,
    **kwargs
)
Create a new MarkItDown instance.
enable_builtins
bool | None
default:"True"
Enable built-in converters. When None or True, built-in converters are automatically registered.
enable_plugins
bool | None
default:"False"
Enable plugin converters. When True, converters from installed plugins are registered.
requests_session
requests.Session
Custom requests session for HTTP operations. If not provided, a default session is created with appropriate Accept headers.
llm_client
Any
LLM client instance for converters that support AI-powered conversion.
llm_model
str
Model name to use with the LLM client.
llm_prompt
str
Custom prompt to use with LLM-based converters.
exiftool_path
str
Path to the exiftool binary for image metadata extraction. If not provided, searches common system paths.
style_map
str
Custom style map for DOCX conversion.
docintel_endpoint
str
Azure Document Intelligence endpoint URL. When provided, enables the Document Intelligence converter.
docintel_credential
Any
Credentials for Azure Document Intelligence.
docintel_file_types
list
File types to process with Document Intelligence.
docintel_api_version
str
API version for Document Intelligence service.

Example

from markitdown import MarkItDown

# Basic usage with defaults
md = MarkItDown()

# With custom configuration
md = MarkItDown(
    enable_builtins=True,
    enable_plugins=False,
    exiftool_path="/usr/local/bin/exiftool"
)

# With Azure Document Intelligence
md = MarkItDown(
    docintel_endpoint="https://your-resource.cognitiveservices.azure.com/",
    docintel_credential=credential
)

Methods

convert()

def convert(
    source: Union[str, requests.Response, Path, BinaryIO],
    *,
    stream_info: Optional[StreamInfo] = None,
    **kwargs: Any
) -> DocumentConverterResult
Convert a document from various source types to Markdown.
source
str | requests.Response | Path | BinaryIO
required
The source to convert. Can be:
  • Local file path (str or Path)
  • URL string (http://, https://, file://, data://)
  • requests.Response object
  • Binary file-like object (BinaryIO)
stream_info
StreamInfo
Optional metadata about the source. If not provided, MarkItDown attempts to infer it.
result
DocumentConverterResult
The conversion result containing the Markdown text and optional metadata.

Example

# Convert local file
result = md.convert("document.pdf")
print(result.markdown)

# Convert URL
result = md.convert("https://example.com/file.docx")

# Convert with explicit stream info
from markitdown import StreamInfo
result = md.convert(
    "file.txt",
    stream_info=StreamInfo(charset="utf-8", mimetype="text/plain")
)

# Convert binary stream
with open("file.pdf", "rb") as f:
    result = md.convert(f)

convert_local()

def convert_local(
    path: Union[str, Path],
    *,
    stream_info: Optional[StreamInfo] = None,
    file_extension: Optional[str] = None,  # Deprecated
    url: Optional[str] = None,  # Deprecated
    **kwargs: Any
) -> DocumentConverterResult
Convert a local file to Markdown.
path
str | Path
required
Path to the local file to convert.
stream_info
StreamInfo
Optional metadata about the file.
result
DocumentConverterResult
The conversion result.

Example

result = md.convert_local("/path/to/document.docx")
print(f"Title: {result.title}")
print(result.markdown)

convert_stream()

def convert_stream(
    stream: BinaryIO,
    *,
    stream_info: Optional[StreamInfo] = None,
    file_extension: Optional[str] = None,  # Deprecated
    url: Optional[str] = None,  # Deprecated
    **kwargs: Any
) -> DocumentConverterResult
Convert a binary stream to Markdown.
stream
BinaryIO
required
Binary file-like object to convert. Must support read() method. If not seekable, the stream is loaded into memory.
stream_info
StreamInfo
Optional metadata about the stream. Used for format detection.
result
DocumentConverterResult
The conversion result.

Example

import io

# Convert in-memory bytes
data = b"PDF content..."
stream = io.BytesIO(data)
result = md.convert_stream(stream)

# With stream info
from markitdown import StreamInfo
result = md.convert_stream(
    stream,
    stream_info=StreamInfo(extension=".pdf", mimetype="application/pdf")
)

convert_url()

def convert_url(
    url: str,
    *,
    stream_info: Optional[StreamInfo] = None,
    file_extension: Optional[str] = None,
    mock_url: Optional[str] = None,
    **kwargs: Any
) -> DocumentConverterResult
Convert a URL to Markdown. This is an alias for convert_uri().
url
str
required
URL to convert (http://, https://, file://, or data://).
stream_info
StreamInfo
Optional metadata override.
mock_url
str
Pretend the content came from this URL instead (for converter routing).
result
DocumentConverterResult
The conversion result.

Example

# Convert web page
result = md.convert_url("https://wikipedia.org/wiki/Python")

# Convert file URI
result = md.convert_url("file:///path/to/document.pdf")

# Convert data URI
result = md.convert_url("data:text/plain;base64,SGVsbG8gV29ybGQ=")

convert_uri()

def convert_uri(
    uri: str,
    *,
    stream_info: Optional[StreamInfo] = None,
    file_extension: Optional[str] = None,
    mock_url: Optional[str] = None,
    **kwargs: Any
) -> DocumentConverterResult
Convert a URI to Markdown. Supports http://, https://, file://, and data:// schemes.
uri
str
required
URI to convert. Supported schemes:
  • http:// and https://: Fetches content via HTTP
  • file://: Reads local file
  • data://: Decodes data URI
stream_info
StreamInfo
Optional metadata override.
mock_url
str
Mock the request as if it came from a different URL.
result
DocumentConverterResult
The conversion result.

Example

# HTTP URI
result = md.convert_uri("https://example.com/doc.pdf")

# File URI
result = md.convert_uri("file:///home/user/document.docx")

# Data URI
result = md.convert_uri(
    "data:text/html;charset=utf-8,%3Ch1%3EHello%3C%2Fh1%3E"
)

convert_response()

def convert_response(
    response: requests.Response,
    *,
    stream_info: Optional[StreamInfo] = None,
    file_extension: Optional[str] = None,
    url: Optional[str] = None,
    **kwargs: Any
) -> DocumentConverterResult
Convert an HTTP response to Markdown.
response
requests.Response
required
HTTP response object from the requests library.
stream_info
StreamInfo
Optional metadata override. By default, metadata is extracted from response headers.
result
DocumentConverterResult
The conversion result.

Example

import requests

response = requests.get("https://example.com/document.pdf")
result = md.convert_response(response)
print(result.markdown)

register_converter()

def register_converter(
    converter: DocumentConverter,
    *,
    priority: float = PRIORITY_SPECIFIC_FILE_FORMAT
) -> None
Register a custom document converter.
converter
DocumentConverter
required
The converter instance to register.
priority
float
default:"0.0"
Converter priority. Lower values are tried first. Use:
  • PRIORITY_SPECIFIC_FILE_FORMAT (0.0) for specific formats
  • PRIORITY_GENERIC_FILE_FORMAT (10.0) for generic/catch-all converters

Example

from markitdown import MarkItDown, DocumentConverter, PRIORITY_SPECIFIC_FILE_FORMAT

class MyCustomConverter(DocumentConverter):
    def accepts(self, file_stream, stream_info, **kwargs):
        return stream_info.extension == ".custom"
    
    def convert(self, file_stream, stream_info, **kwargs):
        # Conversion logic
        return DocumentConverterResult(markdown="# Custom content")

md = MarkItDown()
md.register_converter(
    MyCustomConverter(),
    priority=PRIORITY_SPECIFIC_FILE_FORMAT
)

enable_builtins()

def enable_builtins(**kwargs) -> None
Enable and register built-in converters. Built-in converters are enabled by default. This method should only be called once if built-ins were initially disabled.
kwargs
dict
Configuration options passed to converters (llm_client, exiftool_path, etc.).

Example

# Create instance with builtins disabled
md = MarkItDown(enable_builtins=False)

# Enable later with configuration
md.enable_builtins(
    exiftool_path="/usr/local/bin/exiftool",
    llm_client=my_llm_client
)

enable_plugins()

def enable_plugins(**kwargs) -> None
Enable and register converters provided by installed plugins. Plugins are disabled by default. This method should only be called once if plugins were initially disabled.
kwargs
dict
Configuration options passed to plugin converters.

Example

# Create instance with plugins disabled
md = MarkItDown(enable_plugins=False)

# Enable later
md.enable_plugins()

Constants

PRIORITY_SPECIFIC_FILE_FORMAT

PRIORITY_SPECIFIC_FILE_FORMAT = 0.0
Priority value for converters that handle specific file formats (e.g., .docx, .pdf, .xlsx) or specific websites (e.g., Wikipedia, YouTube).

PRIORITY_GENERIC_FILE_FORMAT

PRIORITY_GENERIC_FILE_FORMAT = 10.0
Priority value for near catch-all converters that handle generic mimetypes (e.g., text/*, application/zip, text/html).

Built-in Converters

When enable_builtins=True (default), the following converters are automatically registered:
  • PlainTextConverter - Plain text files (priority 10.0)
  • HtmlConverter - HTML documents (priority 10.0)
  • ZipConverter - ZIP archives (priority 10.0)
  • RssConverter - RSS feeds
  • WikipediaConverter - Wikipedia pages
  • YouTubeConverter - YouTube videos
  • BingSerpConverter - Bing search results
  • DocxConverter - Microsoft Word documents
  • XlsxConverter - Excel spreadsheets (.xlsx)
  • XlsConverter - Excel spreadsheets (.xls)
  • PptxConverter - PowerPoint presentations
  • PdfConverter - PDF documents
  • ImageConverter - Image files with OCR
  • AudioConverter - Audio files with transcription
  • IpynbConverter - Jupyter notebooks
  • OutlookMsgConverter - Outlook email messages
  • EpubConverter - EPUB ebooks
  • CsvConverter - CSV files
  • DocumentIntelligenceConverter - Azure Document Intelligence (when endpoint provided)

Error Handling

The convert() methods may raise the following exceptions:
  • FileConversionException - Converter attempted conversion but failed
  • UnsupportedFormatException - No converter can handle the format
  • MissingDependencyException - Required dependency not installed
from markitdown import MarkItDown, UnsupportedFormatException

md = MarkItDown()

try:
    result = md.convert("document.xyz")
except UnsupportedFormatException:
    print("This file format is not supported")