Skip to main content

Overview

The PptxConverter class converts Microsoft PowerPoint .pptx files to Markdown. It extracts slide content including text, images (with AI captioning support), tables, charts, and speaker notes.

Dependencies

pip install markitdown[pptx]
Requires: python-pptx

Accepted Formats

MIME Types
list
  • application/vnd.openxmlformats-officedocument.presentationml*
Extensions
list
  • .pptx

Class Definition

class PptxConverter(DocumentConverter):
    """Converts PPTX files to Markdown.
    
    Supports headings, tables and images with alt text.
    """

Constructor

def __init__(self):
    super().__init__()
    self._html_converter = HtmlConverter()

Methods

accepts()

def accepts(
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    **kwargs: Any,
) -> bool
Returns True if the file has a .pptx extension or PowerPoint MIME type.

convert()

def convert(
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    **kwargs: Any,
) -> DocumentConverterResult
Converts a PPTX presentation to Markdown. Parameters:
file_stream
BinaryIO
required
Binary stream of the PPTX file
stream_info
StreamInfo
required
Metadata about the file
llm_client
OpenAI client
OpenAI-compatible client for AI image captioning
llm_model
str
Model to use for image captioning (e.g., “gpt-4o”, “gpt-4-vision-preview”)
llm_prompt
str
Custom prompt for image captioning. Overrides default prompt.
keep_data_uris
bool
default:"False"
If True, embeds images as base64 data URIs. If False, uses placeholder filenames.
Returns: DocumentConverterResult with converted Markdown Raises: MissingDependencyException if python-pptx is not installed

Features

Slide Elements

  • Titles - Converted to Markdown H1 headings
  • Text - Regular text frames preserved
  • Images - Extracted with AI-generated descriptions or alt text
  • Tables - Converted to Markdown tables
  • Charts - Extracted as Markdown tables with data
  • Notes - Speaker notes included under “Notes” section
  • Grouped Shapes - Recursively processed

Image Handling

Images can have descriptions from multiple sources:
  1. AI Caption (if llm_client provided) - Generated description using vision model
  2. Embedded Alt Text - Description from PowerPoint
  3. Shape Name - Fallback to shape name
Descriptions are combined and sanitized for Markdown.

Chart Extraction

Charts are converted to Markdown tables:
### Chart: Sales by Quarter

| Category | Q1 | Q2 | Q3 | Q4 |
|----------|----|----|----|----|  
| Product A | 100 | 150 | 200 | 180 |
| Product B | 80 | 90 | 120 | 140 |
Unsupported chart types show [unsupported chart].

Example Usage

Basic Conversion

from markitdown.converters import PptxConverter
from markitdown._stream_info import StreamInfo

converter = PptxConverter()

with open("presentation.pptx", "rb") as f:
    stream_info = StreamInfo(extension=".pptx")
    result = converter.convert(f, stream_info)
    print(result.markdown)

With AI Image Captioning

from openai import OpenAI

client = OpenAI(api_key="your-api-key")

with open("presentation.pptx", "rb") as f:
    stream_info = StreamInfo(extension=".pptx")
    result = converter.convert(
        f, 
        stream_info,
        llm_client=client,
        llm_model="gpt-4o",
        llm_prompt="Describe this slide image in detail."
    )
    print(result.markdown)

With Base64 Image Embedding

with open("presentation.pptx", "rb") as f:
    stream_info = StreamInfo(extension=".pptx")
    result = converter.convert(
        f,
        stream_info,
        keep_data_uris=True  # Embed images as data URIs
    )
    print(result.markdown)

Output Example

<!-- Slide number: 1 -->
# Welcome to Our Product

Introduction to the new features

![A screenshot showing the product dashboard with analytics](Picture1.jpg)

### Notes:
Remember to emphasize the key benefits

<!-- Slide number: 2 -->
# Sales Data

### Chart: Quarterly Revenue

| Category | Q1 | Q2 | Q3 | Q4 |
|---|---|---|---|---|
| Revenue | 100000 | 120000 | 150000 | 180000 |

Shape Processing Order

Shapes are processed in visual order (top-to-bottom, left-to-right) based on their position on the slide:
sorted_shapes = sorted(
    slide.shapes,
    key=lambda x: (x.top, x.left)
)

Implementation Details

Source Location

~/workspace/source/packages/markitdown/src/markitdown/converters/_pptx_converter.py:34

Helper Methods

  • _is_picture() - Detects picture shapes
  • _is_table() - Detects table shapes
  • _convert_table_to_markdown() - Converts tables via HTML intermediary
  • _convert_chart_to_markdown() - Extracts chart data as tables

Slide Structure

Each slide includes:
  1. Slide number comment
  2. Title (if present)
  3. Content (shapes in visual order)
  4. Notes section (if present)

Limitations

  • Animations and transitions not preserved
  • SmartArt converted to text only
  • Some complex chart types show as [unsupported chart]
  • Video/audio embedded content not extracted
  • Layout and visual styling information lost