Module: objects/pptxobject.py

Purpose:

This module provides the ‘PPTX Document’ object structure into which MS PowerPoint documents are parsed into for transport and onward use.

Platform:

Linux/Windows | Python 3.11+

Developer:

J Berendt

Email:

development@s3dev.uk

Comments:

n/a

class DocPPTX[source]

Bases: _DocBase

Container class for storing data parsed from a PPTX file.

property slides: list[SlideObject]

A list of containing an object for each slide in the document.

Tip

The slide number index aligns to the slide number in the PPTX file.

For example, to access the SlideObject for side 42, use:

slides[42]
property basename: str

Accessor for the file’s basename.

property documents: list

Accessor to the Document objects.

These objects are used for passing into text splitters and for loading documents (and embeddings) into vector databases.

property filepath: str

Accessor for the explicit path to this file.

property metadata: dict | object

The meta data as extracted from the document.

property npages: int

The number of pages successfully extracted from the source.

property ntables: int

The number of tables successfully extracted from the source.

property parser: object

Accessor to the underlying document parser’s functionality.