ImageConverter

Overview

The ImageConverter class converts image files to Markdown by extracting EXIF metadata (if exiftool is installed) and generating descriptions using multimodal LLMs (if configured). Particularly useful for generating accessible alt text and understanding image content.

Dependencies

pip install markitdown  # No extra dependencies required

Required: None (base install)
Optional: exiftool (external binary), OpenAI client for AI descriptions

Accepted Formats

MIME Types

list

image/jpeg
image/png

Extensions

list

.jpg
.jpeg
.png

Class Definition

class ImageConverter(DocumentConverter):
    """Converts images to markdown via extraction of metadata.
    
    Supports metadata extraction (if exiftool is installed) and
    description generation via multimodal LLM (if llm_client configured).
    """

Methods

accepts()

def accepts(
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    **kwargs: Any,
) -> bool

Returns True for JPEG and PNG images.

convert()

def convert(
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    **kwargs: Any,
) -> DocumentConverterResult

Converts an image file to Markdown with metadata and optional AI description. Parameters:

file_stream

BinaryIO

required

Binary stream of the image file

stream_info

StreamInfo

required

Metadata about the file

exiftool_path

str

Path to exiftool binary. If not provided, searches system PATH.

llm_client

OpenAI client

OpenAI-compatible client for image description generation

llm_model

str

Vision model to use (e.g., “gpt-4o”, “gpt-4-vision-preview”)

llm_prompt

str

default:"Write a detailed caption for this image."

Custom prompt for image description

Returns: DocumentConverterResult with metadata and description as Markdown

Features

Metadata Extraction

If exiftool is available, extracts these fields:

ImageSize - Dimensions (e.g., “1920x1080”)
Title - Image title
Caption - Embedded caption
Description - Image description
Keywords - Keyword tags
Artist - Creator/photographer
Author - Author name
DateTimeOriginal - When photo was taken
CreateDate - When file was created
GPSPosition - Geographic coordinates

AI Description

When llm_client and llm_model are provided:

Image converted to base64 data URI
Sent to vision model with prompt
Generated description added under ”# Description:” heading

Example Usage

Metadata Only

from markitdown.converters import ImageConverter
from markitdown._stream_info import StreamInfo

converter = ImageConverter()

with open("photo.jpg", "rb") as f:
    stream_info = StreamInfo(
        extension=".jpg",
        mimetype="image/jpeg"
    )
    result = converter.convert(f, stream_info)
    print(result.markdown)

Output:

ImageSize: 1920x1080
DateTimeOriginal: 2024-02-15 14:30:00
GPSPosition: 37.7749 N, 122.4194 W
Artist: John Doe

With AI Description

from openai import OpenAI

client = OpenAI(api_key="your-api-key")
converter = ImageConverter()

with open("landscape.jpg", "rb") as f:
    stream_info = StreamInfo(extension=".jpg")
    result = converter.convert(
        f,
        stream_info,
        llm_client=client,
        llm_model="gpt-4o",
        llm_prompt="Describe this landscape photo in detail."
    )
    print(result.markdown)

Output:

ImageSize: 3840x2160
DateTimeOriginal: 2024-02-15 16:45:00
GPSPosition: 45.4215 N, 75.6972 W

# Description:
A breathtaking mountain landscape at sunset. Snow-capped peaks rise majestically 
against a vibrant orange and pink sky. In the foreground, a crystal-clear alpine 
lake reflects the mountains, creating a mirror-like effect. Pine trees frame the 
scene on both sides, and wispy clouds add depth to the composition.

Custom exiftool Path

with open("photo.png", "rb") as f:
    stream_info = StreamInfo(extension=".png")
    result = converter.convert(
        f,
        stream_info,
        exiftool_path="/usr/local/bin/exiftool"
    )
    print(result.markdown)

Implementation Details

Source Location

~/workspace/source/packages/markitdown/src/markitdown/converters/_image_converter.py:16

AI Description Pipeline

def _get_llm_description(
    self,
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    *,
    client,
    model,
    prompt=None,
) -> Union[None, str]

Encode Image - Convert to base64

base64_image = base64.b64encode(file_stream.read()).decode("utf-8")

Create Data URI

data_uri = f"data:{content_type};base64,{base64_image}"

Call Vision API - OpenAI-compatible chat completion with image

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": prompt},
            {"type": "image_url", "image_url": {"url": data_uri}}
        ]
    }
]

Error Handling

Metadata extraction failures are silent (no metadata included)
AI description failures return None (no description section)
Stream position preserved on errors

Use Cases

Accessibility

# Generate alt text for web images
result = converter.convert(
    image_stream,
    stream_info,
    llm_client=client,
    llm_model="gpt-4o",
    llm_prompt="Generate concise alt text for this image suitable for screen readers."
)

Photo Organization

# Extract metadata for photo library
import os
from pathlib import Path

for image_path in Path("photos").glob("*.jpg"):
    with open(image_path, "rb") as f:
        result = converter.convert(
            f,
            StreamInfo(extension=".jpg"),
            exiftool_path="/usr/local/bin/exiftool"
        )
        print(f"{image_path.name}:")
        print(result.markdown)
        print("-" * 40)

Image Analysis

# Analyze product images
result = converter.convert(
    product_image,
    stream_info,
    llm_client=client,
    llm_model="gpt-4o",
    llm_prompt="Describe this product image. Include colors, materials, and notable features."
)

Limitations

Only JPEG and PNG formats supported
Other formats (GIF, WebP, TIFF, etc.) not handled
Metadata extraction requires external exiftool binary
AI descriptions require API access and incur costs
Large images may exceed API size limits

Core

Converters

Exceptions

Overview

Dependencies

Accepted Formats

Class Definition

Methods

accepts()

convert()

Features

Metadata Extraction

AI Description

Example Usage

Metadata Only

With AI Description

Custom exiftool Path

Implementation Details

Source Location

AI Description Pipeline

Error Handling

Use Cases

Accessibility

Photo Organization

Image Analysis

Limitations

Core

Converters

Exceptions

​Overview

​Dependencies

​Accepted Formats

​Class Definition

​Methods

​accepts()

​convert()

​Features

​Metadata Extraction

​AI Description

​Example Usage

​Metadata Only

​With AI Description

​Custom exiftool Path

​Implementation Details

​Source Location

​AI Description Pipeline

​Error Handling

​Use Cases

​Accessibility

​Photo Organization

​Image Analysis

​Limitations

Overview

Dependencies

Accepted Formats

Class Definition

Methods

accepts()

convert()

Features

Metadata Extraction

AI Description

Example Usage

Metadata Only

With AI Description

Custom exiftool Path

Implementation Details

Source Location

AI Description Pipeline

Error Handling

Use Cases

Accessibility

Photo Organization

Image Analysis

Limitations