Overview
The ImageConverter class converts image files to Markdown by extracting EXIF metadata (if exiftool is installed) and generating descriptions using multimodal LLMs (if configured). Particularly useful for generating accessible alt text and understanding image content.
Dependencies
pip install markitdown # No extra dependencies required
Required: None (base install)
Optional: exiftool (external binary), OpenAI client for AI descriptions
Class Definition
class ImageConverter(DocumentConverter):
"""Converts images to markdown via extraction of metadata.
Supports metadata extraction (if exiftool is installed) and
description generation via multimodal LLM (if llm_client configured).
"""
Methods
accepts()
def accepts(
file_stream: BinaryIO,
stream_info: StreamInfo,
**kwargs: Any,
) -> bool
Returns True for JPEG and PNG images.
convert()
def convert(
file_stream: BinaryIO,
stream_info: StreamInfo,
**kwargs: Any,
) -> DocumentConverterResult
Converts an image file to Markdown with metadata and optional AI description.
Parameters:
Binary stream of the image file
Path to exiftool binary. If not provided, searches system PATH.
OpenAI-compatible client for image description generation
Vision model to use (e.g., “gpt-4o”, “gpt-4-vision-preview”)
llm_prompt
str
default:"Write a detailed caption for this image."
Custom prompt for image description
Returns: DocumentConverterResult with metadata and description as Markdown
Features
If exiftool is available, extracts these fields:
ImageSize - Dimensions (e.g., “1920x1080”)
Title - Image title
Caption - Embedded caption
Description - Image description
Keywords - Keyword tags
Artist - Creator/photographer
Author - Author name
DateTimeOriginal - When photo was taken
CreateDate - When file was created
GPSPosition - Geographic coordinates
AI Description
When llm_client and llm_model are provided:
- Image converted to base64 data URI
- Sent to vision model with prompt
- Generated description added under ”# Description:” heading
Example Usage
from markitdown.converters import ImageConverter
from markitdown._stream_info import StreamInfo
converter = ImageConverter()
with open("photo.jpg", "rb") as f:
stream_info = StreamInfo(
extension=".jpg",
mimetype="image/jpeg"
)
result = converter.convert(f, stream_info)
print(result.markdown)
Output:
ImageSize: 1920x1080
DateTimeOriginal: 2024-02-15 14:30:00
GPSPosition: 37.7749 N, 122.4194 W
Artist: John Doe
With AI Description
from openai import OpenAI
client = OpenAI(api_key="your-api-key")
converter = ImageConverter()
with open("landscape.jpg", "rb") as f:
stream_info = StreamInfo(extension=".jpg")
result = converter.convert(
f,
stream_info,
llm_client=client,
llm_model="gpt-4o",
llm_prompt="Describe this landscape photo in detail."
)
print(result.markdown)
Output:
ImageSize: 3840x2160
DateTimeOriginal: 2024-02-15 16:45:00
GPSPosition: 45.4215 N, 75.6972 W
# Description:
A breathtaking mountain landscape at sunset. Snow-capped peaks rise majestically
against a vibrant orange and pink sky. In the foreground, a crystal-clear alpine
lake reflects the mountains, creating a mirror-like effect. Pine trees frame the
scene on both sides, and wispy clouds add depth to the composition.
with open("photo.png", "rb") as f:
stream_info = StreamInfo(extension=".png")
result = converter.convert(
f,
stream_info,
exiftool_path="/usr/local/bin/exiftool"
)
print(result.markdown)
Implementation Details
Source Location
~/workspace/source/packages/markitdown/src/markitdown/converters/_image_converter.py:16
AI Description Pipeline
def _get_llm_description(
self,
file_stream: BinaryIO,
stream_info: StreamInfo,
*,
client,
model,
prompt=None,
) -> Union[None, str]
-
Encode Image - Convert to base64
base64_image = base64.b64encode(file_stream.read()).decode("utf-8")
-
Create Data URI
data_uri = f"data:{content_type};base64,{base64_image}"
-
Call Vision API - OpenAI-compatible chat completion with image
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": data_uri}}
]
}
]
Error Handling
- Metadata extraction failures are silent (no metadata included)
- AI description failures return
None (no description section)
- Stream position preserved on errors
Use Cases
Accessibility
# Generate alt text for web images
result = converter.convert(
image_stream,
stream_info,
llm_client=client,
llm_model="gpt-4o",
llm_prompt="Generate concise alt text for this image suitable for screen readers."
)
Photo Organization
# Extract metadata for photo library
import os
from pathlib import Path
for image_path in Path("photos").glob("*.jpg"):
with open(image_path, "rb") as f:
result = converter.convert(
f,
StreamInfo(extension=".jpg"),
exiftool_path="/usr/local/bin/exiftool"
)
print(f"{image_path.name}:")
print(result.markdown)
print("-" * 40)
Image Analysis
# Analyze product images
result = converter.convert(
product_image,
stream_info,
llm_client=client,
llm_model="gpt-4o",
llm_prompt="Describe this product image. Include colors, materials, and notable features."
)
Limitations
- Only JPEG and PNG formats supported
- Other formats (GIF, WebP, TIFF, etc.) not handled
- Metadata extraction requires external
exiftool binary
- AI descriptions require API access and incur costs
- Large images may exceed API size limits