Monday, 16 December 2024

Python tool for converting pdf file and office documents to Markdown file.

The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)

It presently supports:

Installation

You can install markitdown using pip:

pip install markitdown

or from the source

pip install -e .

Usage

To use this as a command-line utility, install it and then run it like this:

markitdown path-to-file.pdf

This will output Markdown to standard output. You can save it like this:

markitdown path-to-file.pdf > document.md

You can pipe content to standard input by omitting the argument:

cat path-to-file.pdf | markitdown

To run tests, install hatch using pip or other methods as described here.

pip install hatch
hatch shell
hatch test

Please run the pre-commit checks before submitting a PR.

pre-commit run --all-files

from https://github.com/microsoft/markitdown