PptxConverter

Overview

The PptxConverter class converts Microsoft PowerPoint .pptx files to Markdown. It extracts slide content including text, images (with AI captioning support), tables, charts, and speaker notes.

Dependencies

pip install markitdown[pptx]

Requires: python-pptx

Accepted Formats

MIME Types

list

application/vnd.openxmlformats-officedocument.presentationml*

Extensions

list

.pptx

Class Definition

class PptxConverter(DocumentConverter):
    """Converts PPTX files to Markdown.
    
    Supports headings, tables and images with alt text.
    """

Constructor

def __init__(self):
    super().__init__()
    self._html_converter = HtmlConverter()

Methods

accepts()

def accepts(
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    **kwargs: Any,
) -> bool

Returns True if the file has a .pptx extension or PowerPoint MIME type.

convert()

def convert(
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    **kwargs: Any,
) -> DocumentConverterResult

Converts a PPTX presentation to Markdown. Parameters:

file_stream

BinaryIO

required

Binary stream of the PPTX file

stream_info

StreamInfo

required

Metadata about the file

llm_client

OpenAI client

OpenAI-compatible client for AI image captioning

llm_model

str

Model to use for image captioning (e.g., “gpt-4o”, “gpt-4-vision-preview”)

llm_prompt

str

Custom prompt for image captioning. Overrides default prompt.

keep_data_uris

bool

default:"False"

If True, embeds images as base64 data URIs. If False, uses placeholder filenames.

Returns: DocumentConverterResult with converted Markdown Raises: MissingDependencyException if python-pptx is not installed

Features

Slide Elements

Titles - Converted to Markdown H1 headings
Text - Regular text frames preserved
Images - Extracted with AI-generated descriptions or alt text
Tables - Converted to Markdown tables
Charts - Extracted as Markdown tables with data
Notes - Speaker notes included under “Notes” section
Grouped Shapes - Recursively processed

Image Handling

Images can have descriptions from multiple sources:

AI Caption (if llm_client provided) - Generated description using vision model
Embedded Alt Text - Description from PowerPoint
Shape Name - Fallback to shape name

Descriptions are combined and sanitized for Markdown.

Chart Extraction

Charts are converted to Markdown tables:

### Chart: Sales by Quarter

| Category | Q1 | Q2 | Q3 | Q4 |
|----------|----|----|----|----|  
| Product A | 100 | 150 | 200 | 180 |
| Product B | 80 | 90 | 120 | 140 |

Unsupported chart types show [unsupported chart].

Example Usage

Basic Conversion

from markitdown.converters import PptxConverter
from markitdown._stream_info import StreamInfo

converter = PptxConverter()

with open("presentation.pptx", "rb") as f:
    stream_info = StreamInfo(extension=".pptx")
    result = converter.convert(f, stream_info)
    print(result.markdown)

With AI Image Captioning

from openai import OpenAI

client = OpenAI(api_key="your-api-key")

with open("presentation.pptx", "rb") as f:
    stream_info = StreamInfo(extension=".pptx")
    result = converter.convert(
        f, 
        stream_info,
        llm_client=client,
        llm_model="gpt-4o",
        llm_prompt="Describe this slide image in detail."
    )
    print(result.markdown)

With Base64 Image Embedding

with open("presentation.pptx", "rb") as f:
    stream_info = StreamInfo(extension=".pptx")
    result = converter.convert(
        f,
        stream_info,
        keep_data_uris=True  # Embed images as data URIs
    )
    print(result.markdown)

Output Example

<!-- Slide number: 1 -->
# Welcome to Our Product

Introduction to the new features

![A screenshot showing the product dashboard with analytics](Picture1.jpg)

### Notes:
Remember to emphasize the key benefits

<!-- Slide number: 2 -->
# Sales Data

### Chart: Quarterly Revenue

| Category | Q1 | Q2 | Q3 | Q4 |
|---|---|---|---|---|
| Revenue | 100000 | 120000 | 150000 | 180000 |

Shape Processing Order

Shapes are processed in visual order (top-to-bottom, left-to-right) based on their position on the slide:

sorted_shapes = sorted(
    slide.shapes,
    key=lambda x: (x.top, x.left)
)

Implementation Details

Source Location

~/workspace/source/packages/markitdown/src/markitdown/converters/_pptx_converter.py:34

Helper Methods

_is_picture() - Detects picture shapes
_is_table() - Detects table shapes
_convert_table_to_markdown() - Converts tables via HTML intermediary
_convert_chart_to_markdown() - Extracts chart data as tables

Slide Structure

Each slide includes:

Slide number comment
Title (if present)
Content (shapes in visual order)
Notes section (if present)

Limitations

Animations and transitions not preserved
SmartArt converted to text only
Some complex chart types show as [unsupported chart]
Video/audio embedded content not extracted
Layout and visual styling information lost

Core

Converters

Exceptions

Overview

Dependencies

Accepted Formats

Class Definition

Constructor

Methods

accepts()

convert()

Features

Slide Elements

Image Handling

Chart Extraction

Example Usage

Basic Conversion

With AI Image Captioning

With Base64 Image Embedding

Output Example

Shape Processing Order

Implementation Details

Source Location

Helper Methods

Slide Structure

Limitations

Core

Converters

Exceptions

​Overview

​Dependencies

​Accepted Formats

​Class Definition

​Constructor

​Methods

​accepts()

​convert()

​Features

​Slide Elements

​Image Handling

​Chart Extraction

​Example Usage

​Basic Conversion

​With AI Image Captioning

​With Base64 Image Embedding

​Output Example

​Shape Processing Order

​Implementation Details

​Source Location

​Helper Methods

​Slide Structure

​Limitations

Overview

Dependencies

Accepted Formats

Class Definition

Constructor

Methods

accepts()

convert()

Features

Slide Elements

Image Handling

Chart Extraction

Example Usage

Basic Conversion

With AI Image Captioning

With Base64 Image Embedding

Output Example

Shape Processing Order

Implementation Details

Source Location

Helper Methods

Slide Structure

Limitations