MarkItDown

The MarkItDown class is the primary interface for converting various document formats to Markdown. It manages converter registration, file type detection, and the conversion process.

Constructor

MarkItDown(
    *,
    enable_builtins: Union[None, bool] = None,
    enable_plugins: Union[None, bool] = None,
    **kwargs
)

Create a new MarkItDown instance.

enable_builtins

bool | None

default:"True"

Enable built-in converters. When None or True, built-in converters are automatically registered.

enable_plugins

bool | None

default:"False"

Enable plugin converters. When True, converters from installed plugins are registered.

requests_session

requests.Session

Custom requests session for HTTP operations. If not provided, a default session is created with appropriate Accept headers.

llm_client

Any

LLM client instance for converters that support AI-powered conversion.

llm_model

str

Model name to use with the LLM client.

llm_prompt

str

Custom prompt to use with LLM-based converters.

exiftool_path

str

Path to the exiftool binary for image metadata extraction. If not provided, searches common system paths.

style_map

str

Custom style map for DOCX conversion.

docintel_endpoint

str

Azure Document Intelligence endpoint URL. When provided, enables the Document Intelligence converter.

docintel_credential

Any

Credentials for Azure Document Intelligence.

docintel_file_types

list

File types to process with Document Intelligence.

docintel_api_version

str

API version for Document Intelligence service.

Example

from markitdown import MarkItDown

# Basic usage with defaults
md = MarkItDown()

# With custom configuration
md = MarkItDown(
    enable_builtins=True,
    enable_plugins=False,
    exiftool_path="/usr/local/bin/exiftool"
)

# With Azure Document Intelligence
md = MarkItDown(
    docintel_endpoint="https://your-resource.cognitiveservices.azure.com/",
    docintel_credential=credential
)

Methods

convert()

def convert(
    source: Union[str, requests.Response, Path, BinaryIO],
    *,
    stream_info: Optional[StreamInfo] = None,
    **kwargs: Any
) -> DocumentConverterResult

Convert a document from various source types to Markdown.

source

str | requests.Response | Path | BinaryIO

required

The source to convert. Can be:

Local file path (str or Path)
URL string (http://, https://, file://, data://)
requests.Response object
Binary file-like object (BinaryIO)

stream_info

StreamInfo

Optional metadata about the source. If not provided, MarkItDown attempts to infer it.

result

DocumentConverterResult

The conversion result containing the Markdown text and optional metadata.

Example

# Convert local file
result = md.convert("document.pdf")
print(result.markdown)

# Convert URL
result = md.convert("https://example.com/file.docx")

# Convert with explicit stream info
from markitdown import StreamInfo
result = md.convert(
    "file.txt",
    stream_info=StreamInfo(charset="utf-8", mimetype="text/plain")
)

# Convert binary stream
with open("file.pdf", "rb") as f:
    result = md.convert(f)

convert_local()

def convert_local(
    path: Union[str, Path],
    *,
    stream_info: Optional[StreamInfo] = None,
    file_extension: Optional[str] = None,  # Deprecated
    url: Optional[str] = None,  # Deprecated
    **kwargs: Any
) -> DocumentConverterResult

Convert a local file to Markdown.

path

str | Path

required

Path to the local file to convert.

stream_info

StreamInfo

Optional metadata about the file.

result

DocumentConverterResult

The conversion result.

Example

result = md.convert_local("/path/to/document.docx")
print(f"Title: {result.title}")
print(result.markdown)

convert_stream()

def convert_stream(
    stream: BinaryIO,
    *,
    stream_info: Optional[StreamInfo] = None,
    file_extension: Optional[str] = None,  # Deprecated
    url: Optional[str] = None,  # Deprecated
    **kwargs: Any
) -> DocumentConverterResult

Convert a binary stream to Markdown.

stream

BinaryIO

required

Binary file-like object to convert. Must support read() method. If not seekable, the stream is loaded into memory.

stream_info

StreamInfo

Optional metadata about the stream. Used for format detection.

result

DocumentConverterResult

The conversion result.

Example

import io

# Convert in-memory bytes
data = b"PDF content..."
stream = io.BytesIO(data)
result = md.convert_stream(stream)

# With stream info
from markitdown import StreamInfo
result = md.convert_stream(
    stream,
    stream_info=StreamInfo(extension=".pdf", mimetype="application/pdf")
)

convert_url()

def convert_url(
    url: str,
    *,
    stream_info: Optional[StreamInfo] = None,
    file_extension: Optional[str] = None,
    mock_url: Optional[str] = None,
    **kwargs: Any
) -> DocumentConverterResult

Convert a URL to Markdown. This is an alias for convert_uri().

url

str

required

URL to convert (http://, https://, file://, or data://).

stream_info

StreamInfo

Optional metadata override.

mock_url

str

Pretend the content came from this URL instead (for converter routing).

result

DocumentConverterResult

The conversion result.

Example

# Convert web page
result = md.convert_url("https://wikipedia.org/wiki/Python")

# Convert file URI
result = md.convert_url("file:///path/to/document.pdf")

# Convert data URI
result = md.convert_url("data:text/plain;base64,SGVsbG8gV29ybGQ=")

convert_uri()

def convert_uri(
    uri: str,
    *,
    stream_info: Optional[StreamInfo] = None,
    file_extension: Optional[str] = None,
    mock_url: Optional[str] = None,
    **kwargs: Any
) -> DocumentConverterResult

Convert a URI to Markdown. Supports http://, https://, file://, and data:// schemes.

uri

str

required

URI to convert. Supported schemes:

http:// and https://: Fetches content via HTTP
file://: Reads local file
data://: Decodes data URI

stream_info

StreamInfo

Optional metadata override.

mock_url

str

Mock the request as if it came from a different URL.

result

DocumentConverterResult

The conversion result.

Example

# HTTP URI
result = md.convert_uri("https://example.com/doc.pdf")

# File URI
result = md.convert_uri("file:///home/user/document.docx")

# Data URI
result = md.convert_uri(
    "data:text/html;charset=utf-8,%3Ch1%3EHello%3C%2Fh1%3E"
)

convert_response()

def convert_response(
    response: requests.Response,
    *,
    stream_info: Optional[StreamInfo] = None,
    file_extension: Optional[str] = None,
    url: Optional[str] = None,
    **kwargs: Any
) -> DocumentConverterResult

Convert an HTTP response to Markdown.

response

requests.Response

required

HTTP response object from the requests library.

stream_info

StreamInfo

Optional metadata override. By default, metadata is extracted from response headers.

result

DocumentConverterResult

The conversion result.

Example

import requests

response = requests.get("https://example.com/document.pdf")
result = md.convert_response(response)
print(result.markdown)

register_converter()

def register_converter(
    converter: DocumentConverter,
    *,
    priority: float = PRIORITY_SPECIFIC_FILE_FORMAT
) -> None

converter

DocumentConverter

required

The converter instance to register.

priority

float

default:"0.0"

Converter priority. Lower values are tried first. Use:

PRIORITY_SPECIFIC_FILE_FORMAT (0.0) for specific formats
PRIORITY_GENERIC_FILE_FORMAT (10.0) for generic/catch-all converters

Example

from markitdown import MarkItDown, DocumentConverter, PRIORITY_SPECIFIC_FILE_FORMAT

class MyCustomConverter(DocumentConverter):
    def accepts(self, file_stream, stream_info, **kwargs):
        return stream_info.extension == ".custom"
    
    def convert(self, file_stream, stream_info, **kwargs):
        # Conversion logic
        return DocumentConverterResult(markdown="# Custom content")

md = MarkItDown()
md.register_converter(
    MyCustomConverter(),
    priority=PRIORITY_SPECIFIC_FILE_FORMAT
)

enable_builtins()

def enable_builtins(**kwargs) -> None

Enable and register built-in converters. Built-in converters are enabled by default. This method should only be called once if built-ins were initially disabled.

kwargs

dict

Configuration options passed to converters (llm_client, exiftool_path, etc.).

Example

# Create instance with builtins disabled
md = MarkItDown(enable_builtins=False)

# Enable later with configuration
md.enable_builtins(
    exiftool_path="/usr/local/bin/exiftool",
    llm_client=my_llm_client
)

enable_plugins()

def enable_plugins(**kwargs) -> None

Enable and register converters provided by installed plugins. Plugins are disabled by default. This method should only be called once if plugins were initially disabled.

kwargs

dict

Configuration options passed to plugin converters.

Example

# Create instance with plugins disabled
md = MarkItDown(enable_plugins=False)

# Enable later
md.enable_plugins()

Constants

PRIORITY_SPECIFIC_FILE_FORMAT

PRIORITY_SPECIFIC_FILE_FORMAT = 0.0

Priority value for converters that handle specific file formats (e.g., .docx, .pdf, .xlsx) or specific websites (e.g., Wikipedia, YouTube).

PRIORITY_GENERIC_FILE_FORMAT

PRIORITY_GENERIC_FILE_FORMAT = 10.0

Priority value for near catch-all converters that handle generic mimetypes (e.g., text/*, application/zip, text/html).

Built-in Converters

When enable_builtins=True (default), the following converters are automatically registered:

PlainTextConverter - Plain text files (priority 10.0)
HtmlConverter - HTML documents (priority 10.0)
ZipConverter - ZIP archives (priority 10.0)
RssConverter - RSS feeds
WikipediaConverter - Wikipedia pages
YouTubeConverter - YouTube videos
BingSerpConverter - Bing search results
DocxConverter - Microsoft Word documents
XlsxConverter - Excel spreadsheets (.xlsx)
XlsConverter - Excel spreadsheets (.xls)
PptxConverter - PowerPoint presentations
PdfConverter - PDF documents
ImageConverter - Image files with OCR
AudioConverter - Audio files with transcription
IpynbConverter - Jupyter notebooks
OutlookMsgConverter - Outlook email messages
EpubConverter - EPUB ebooks
CsvConverter - CSV files
DocumentIntelligenceConverter - Azure Document Intelligence (when endpoint provided)

Error Handling

The convert() methods may raise the following exceptions:

FileConversionException - Converter attempted conversion but failed
UnsupportedFormatException - No converter can handle the format
MissingDependencyException - Required dependency not installed

from markitdown import MarkItDown, UnsupportedFormatException

md = MarkItDown()

try:
    result = md.convert("document.xyz")
except UnsupportedFormatException:
    print("This file format is not supported")

Core

Converters

Exceptions

Constructor

Example

Methods

convert()

Example

convert_local()

Example

convert_stream()

Example

convert_url()

Example

convert_uri()

Example

convert_response()

Example

register_converter()

Example

enable_builtins()

Example

enable_plugins()

Example

Constants

PRIORITY_SPECIFIC_FILE_FORMAT

PRIORITY_GENERIC_FILE_FORMAT

Built-in Converters

Error Handling

Core

Converters

Exceptions

​Constructor

​Example

​Methods

​convert()

​Example

​convert_local()

​Example

​convert_stream()

​Example

​convert_url()

​Example

​convert_uri()

​Example

​convert_response()

​Example

​register_converter()

​Example

​enable_builtins()

​Example

​enable_plugins()

​Example

​Constants

​PRIORITY_SPECIFIC_FILE_FORMAT

​PRIORITY_GENERIC_FILE_FORMAT

​Built-in Converters

​Error Handling

Constructor

Example

Methods

convert()

Example

convert_local()

Example

convert_stream()

Example

convert_url()

Example

convert_uri()

Example

convert_response()

Example

register_converter()

Example

enable_builtins()

Example

enable_plugins()

Example

Constants

PRIORITY_SPECIFIC_FILE_FORMAT

PRIORITY_GENERIC_FILE_FORMAT

Built-in Converters

Error Handling