Skip to main content

Overview

MarkItDown provides two converters for Excel files:
  • XlsxConverter - For modern Excel files (.xlsx, Excel 2007+)
  • XlsConverter - For legacy Excel files (.xls, Excel 97-2003)
Both converters extract data from all sheets and present each as a separate Markdown table.

Dependencies

pip install markitdown[xlsx]
XLSX requires: pandas, openpyxl
XLS requires: pandas, xlrd

XlsxConverter

Accepted Formats

MIME Types
list
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Extensions
list
  • .xlsx

Class Definition

class XlsxConverter(DocumentConverter):
    """Converts XLSX files to Markdown.
    
    Each sheet presented as a separate Markdown table.
    """

Constructor

def __init__(self):
    super().__init__()
    self._html_converter = HtmlConverter()

Methods

accepts()

def accepts(
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    **kwargs: Any,
) -> bool
Returns True for .xlsx files.

convert()

def convert(
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    **kwargs: Any,
) -> DocumentConverterResult
Converts an XLSX file to Markdown. Returns: DocumentConverterResult with all sheets as Markdown tables Raises: MissingDependencyException if dependencies not installed

Example Usage

from markitdown.converters import XlsxConverter
from markitdown._stream_info import StreamInfo

converter = XlsxConverter()

with open("spreadsheet.xlsx", "rb") as f:
    stream_info = StreamInfo(extension=".xlsx")
    result = converter.convert(f, stream_info)
    print(result.markdown)

Output Example

## Sheet1

| Name | Age | City |
| --- | --- | --- |
| Alice | 30 | New York |
| Bob | 25 | San Francisco |
| Charlie | 35 | Chicago |

## Sheet2

| Product | Price | Stock |
| --- | --- | --- |
| Widget A | 19.99 | 100 |
| Widget B | 29.99 | 50 |

XlsConverter

Accepted Formats

MIME Types
list
  • application/vnd.ms-excel
  • application/excel
Extensions
list
  • .xls

Class Definition

class XlsConverter(DocumentConverter):
    """Converts XLS files to Markdown.
    
    Each sheet presented as a separate Markdown table.
    """

Constructor

def __init__(self):
    super().__init__()
    self._html_converter = HtmlConverter()

Methods

accepts()

def accepts(
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    **kwargs: Any,
) -> bool
Returns True for .xls files.

convert()

def convert(
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    **kwargs: Any,
) -> DocumentConverterResult
Converts an XLS file to Markdown. Returns: DocumentConverterResult with all sheets as Markdown tables Raises: MissingDependencyException if dependencies not installed

Example Usage

from markitdown.converters import XlsConverter
from markitdown._stream_info import StreamInfo

converter = XlsConverter()

with open("legacy_file.xls", "rb") as f:
    stream_info = StreamInfo(extension=".xls")
    result = converter.convert(f, stream_info)
    print(result.markdown)

Implementation Details

Source Location

~/workspace/source/packages/markitdown/src/markitdown/converters/_xlsx_converter.py
  • XlsxConverter: Line 36
  • XlsConverter: Line 98

Conversion Pipeline

Both converters use the same process:
  1. Read All Sheets - Load all worksheets using pandas
    sheets = pd.read_excel(file_stream, sheet_name=None, engine="openpyxl")
    
  2. Convert to HTML - Each sheet converted to HTML table
    html_content = sheets[sheet_name].to_html(index=False)
    
  3. HTML to Markdown - HTML table converted to Markdown
    md_content = self._html_converter.convert_string(html_content)
    
  4. Combine Sheets - All sheets joined with H2 headers

Sheet Headers

Each sheet is prefixed with an H2 heading using the sheet name:
## SheetName

Features

Supported Elements

  • Data Types - Numbers, text, dates, booleans
  • Multiple Sheets - All sheets included with headers
  • Formulas - Evaluated values shown (not formula text)
  • Merged Cells - Handled by pandas

Data Handling

  • Index Column - Not included (index=False)
  • Column Names - First row used as header
  • Missing Values - Empty cells rendered as empty table cells
  • Formatting - Cell formatting (colors, fonts) not preserved

Limitations

  • Charts and images not extracted
  • Cell styling and colors not preserved
  • Formulas shown as values, not expressions
  • Macros and VBA code not included
  • Multiple tables per sheet may merge
  • Conditional formatting not preserved
  • Comments and notes not extracted

Performance Considerations

  • Entire workbook loaded into memory
  • Large spreadsheets may require significant RAM
  • Processing time scales with number of sheets and cells