Overview
ThePptxConverter class converts Microsoft PowerPoint .pptx files to Markdown. It extracts slide content including text, images (with AI captioning support), tables, charts, and speaker notes.
Dependencies
python-pptx
Accepted Formats
application/vnd.openxmlformats-officedocument.presentationml*
.pptx
Class Definition
Constructor
Methods
accepts()
True if the file has a .pptx extension or PowerPoint MIME type.
convert()
Binary stream of the PPTX file
Metadata about the file
OpenAI-compatible client for AI image captioning
Model to use for image captioning (e.g., “gpt-4o”, “gpt-4-vision-preview”)
Custom prompt for image captioning. Overrides default prompt.
If
True, embeds images as base64 data URIs. If False, uses placeholder filenames.DocumentConverterResult with converted Markdown
Raises: MissingDependencyException if python-pptx is not installed
Features
Slide Elements
- Titles - Converted to Markdown H1 headings
- Text - Regular text frames preserved
- Images - Extracted with AI-generated descriptions or alt text
- Tables - Converted to Markdown tables
- Charts - Extracted as Markdown tables with data
- Notes - Speaker notes included under “Notes” section
- Grouped Shapes - Recursively processed
Image Handling
Images can have descriptions from multiple sources:- AI Caption (if
llm_clientprovided) - Generated description using vision model - Embedded Alt Text - Description from PowerPoint
- Shape Name - Fallback to shape name
Chart Extraction
Charts are converted to Markdown tables:[unsupported chart].
Example Usage
Basic Conversion
With AI Image Captioning
With Base64 Image Embedding
Output Example
Shape Processing Order
Shapes are processed in visual order (top-to-bottom, left-to-right) based on their position on the slide:Implementation Details
Source Location
~/workspace/source/packages/markitdown/src/markitdown/converters/_pptx_converter.py:34
Helper Methods
_is_picture()- Detects picture shapes_is_table()- Detects table shapes_convert_table_to_markdown()- Converts tables via HTML intermediary_convert_chart_to_markdown()- Extracts chart data as tables
Slide Structure
Each slide includes:- Slide number comment
- Title (if present)
- Content (shapes in visual order)
- Notes section (if present)
Limitations
- Animations and transitions not preserved
- SmartArt converted to text only
- Some complex chart types show as
[unsupported chart] - Video/audio embedded content not extracted
- Layout and visual styling information lost