Library API Documentation

The page contains simple library usage examples and the module-level documentation for each of the importable modules in docp-parsers.

Use Cases 

To save digging through the documentation for each module and cobbling together what a ‘standard use case’ may look like, a couple have been provided here.

Extract text from a PDF file

>>> from docp_parsers import PDFParser

>>> pdf = PDFParser(path='/path/to/myfile.pdf')
>>> pdf.extract_text()

# Access the content of page 1.
>>> pg1 = pdf.pages[1].content

Extracting text from a PowerPoint presentation

>>> from docp_parsers import PPTXParser

>>> pptx = PPTXParser(path='/path/to/myfile.pptx')
>>> pptx.extract_text()

# Access the text on slide 1.
>>> pg1 = pptx.slides[1].content

Module Documentation 

In addition to the module-level documentation, most of the public classes and/or methods come with one or more usage examples and access to the source code itself.

There are two type of modules listed here:

Those whose API is designed to be accessed by the user/caller

Those which are designated ‘private’ and designed only for internal use

We’ve exposed both here for completeness and to aid in understanding how the library is implemented:

Last updated: 24 Jan 2026

Library API Documentation

Use Cases

Extract text from a PDF file

Extracting text from a PowerPoint presentation

Module Documentation

Use Cases 

Module Documentation 