Skip to main content

Overview

The ImageConverter class converts image files to Markdown by extracting EXIF metadata (if exiftool is installed) and generating descriptions using multimodal LLMs (if configured). Particularly useful for generating accessible alt text and understanding image content.

Dependencies

pip install markitdown  # No extra dependencies required
Required: None (base install)
Optional: exiftool (external binary), OpenAI client for AI descriptions

Accepted Formats

MIME Types
list
  • image/jpeg
  • image/png
Extensions
list
  • .jpg
  • .jpeg
  • .png

Class Definition

class ImageConverter(DocumentConverter):
    """Converts images to markdown via extraction of metadata.
    
    Supports metadata extraction (if exiftool is installed) and
    description generation via multimodal LLM (if llm_client configured).
    """

Methods

accepts()

def accepts(
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    **kwargs: Any,
) -> bool
Returns True for JPEG and PNG images.

convert()

def convert(
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    **kwargs: Any,
) -> DocumentConverterResult
Converts an image file to Markdown with metadata and optional AI description. Parameters:
file_stream
BinaryIO
required
Binary stream of the image file
stream_info
StreamInfo
required
Metadata about the file
exiftool_path
str
Path to exiftool binary. If not provided, searches system PATH.
llm_client
OpenAI client
OpenAI-compatible client for image description generation
llm_model
str
Vision model to use (e.g., “gpt-4o”, “gpt-4-vision-preview”)
llm_prompt
str
default:"Write a detailed caption for this image."
Custom prompt for image description
Returns: DocumentConverterResult with metadata and description as Markdown

Features

Metadata Extraction

If exiftool is available, extracts these fields:
  • ImageSize - Dimensions (e.g., “1920x1080”)
  • Title - Image title
  • Caption - Embedded caption
  • Description - Image description
  • Keywords - Keyword tags
  • Artist - Creator/photographer
  • Author - Author name
  • DateTimeOriginal - When photo was taken
  • CreateDate - When file was created
  • GPSPosition - Geographic coordinates

AI Description

When llm_client and llm_model are provided:
  1. Image converted to base64 data URI
  2. Sent to vision model with prompt
  3. Generated description added under ”# Description:” heading

Example Usage

Metadata Only

from markitdown.converters import ImageConverter
from markitdown._stream_info import StreamInfo

converter = ImageConverter()

with open("photo.jpg", "rb") as f:
    stream_info = StreamInfo(
        extension=".jpg",
        mimetype="image/jpeg"
    )
    result = converter.convert(f, stream_info)
    print(result.markdown)
Output:
ImageSize: 1920x1080
DateTimeOriginal: 2024-02-15 14:30:00
GPSPosition: 37.7749 N, 122.4194 W
Artist: John Doe

With AI Description

from openai import OpenAI

client = OpenAI(api_key="your-api-key")
converter = ImageConverter()

with open("landscape.jpg", "rb") as f:
    stream_info = StreamInfo(extension=".jpg")
    result = converter.convert(
        f,
        stream_info,
        llm_client=client,
        llm_model="gpt-4o",
        llm_prompt="Describe this landscape photo in detail."
    )
    print(result.markdown)
Output:
ImageSize: 3840x2160
DateTimeOriginal: 2024-02-15 16:45:00
GPSPosition: 45.4215 N, 75.6972 W

# Description:
A breathtaking mountain landscape at sunset. Snow-capped peaks rise majestically 
against a vibrant orange and pink sky. In the foreground, a crystal-clear alpine 
lake reflects the mountains, creating a mirror-like effect. Pine trees frame the 
scene on both sides, and wispy clouds add depth to the composition.

Custom exiftool Path

with open("photo.png", "rb") as f:
    stream_info = StreamInfo(extension=".png")
    result = converter.convert(
        f,
        stream_info,
        exiftool_path="/usr/local/bin/exiftool"
    )
    print(result.markdown)

Implementation Details

Source Location

~/workspace/source/packages/markitdown/src/markitdown/converters/_image_converter.py:16

AI Description Pipeline

def _get_llm_description(
    self,
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    *,
    client,
    model,
    prompt=None,
) -> Union[None, str]
  1. Encode Image - Convert to base64
    base64_image = base64.b64encode(file_stream.read()).decode("utf-8")
    
  2. Create Data URI
    data_uri = f"data:{content_type};base64,{base64_image}"
    
  3. Call Vision API - OpenAI-compatible chat completion with image
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {"type": "image_url", "image_url": {"url": data_uri}}
            ]
        }
    ]
    

Error Handling

  • Metadata extraction failures are silent (no metadata included)
  • AI description failures return None (no description section)
  • Stream position preserved on errors

Use Cases

Accessibility

# Generate alt text for web images
result = converter.convert(
    image_stream,
    stream_info,
    llm_client=client,
    llm_model="gpt-4o",
    llm_prompt="Generate concise alt text for this image suitable for screen readers."
)

Photo Organization

# Extract metadata for photo library
import os
from pathlib import Path

for image_path in Path("photos").glob("*.jpg"):
    with open(image_path, "rb") as f:
        result = converter.convert(
            f,
            StreamInfo(extension=".jpg"),
            exiftool_path="/usr/local/bin/exiftool"
        )
        print(f"{image_path.name}:")
        print(result.markdown)
        print("-" * 40)

Image Analysis

# Analyze product images
result = converter.convert(
    product_image,
    stream_info,
    llm_client=client,
    llm_model="gpt-4o",
    llm_prompt="Describe this product image. Include colors, materials, and notable features."
)

Limitations

  • Only JPEG and PNG formats supported
  • Other formats (GIF, WebP, TIFF, etc.) not handled
  • Metadata extraction requires external exiftool binary
  • AI descriptions require API access and incur costs
  • Large images may exceed API size limits