Module: objects/pptxobject.py
- Purpose:
This module provides the ‘PPTX Document’ object structure into which MS PowerPoint documents are parsed into for transport and onward use.
- Platform:
Linux/Windows | Python 3.11+
- Developer:
J Berendt
- Email:
- Comments:
n/a
- class DocPPTX[source]
Bases:
_DocBaseContainer class for storing data parsed from a PPTX file.
- property slides: list[SlideObject]
A list of containing an object for each slide in the document.
Tip
The slide number index aligns to the slide number in the PPTX file.
For example, to access the
SlideObjectfor side 42, use:slides[42]
- property basename: str
Accessor for the file’s basename.
- property documents: list
Accessor to the
Documentobjects.These objects are used for passing into text splitters and for loading documents (and embeddings) into vector databases.
- property filepath: str
Accessor for the explicit path to this file.
- property metadata: dict | object
The meta data as extracted from the document.
- property npages: int
The number of pages successfully extracted from the source.
- property ntables: int
The number of tables successfully extracted from the source.
- property parser: object
Accessor to the underlying document parser’s functionality.