Python gfx API

From Swftools

gfx

[. index]
[file:/home/kramm/c/swftools/wx/gfx.so /home/kramm/c/swftools/wx/gfx.so]

This module contains a PDF parser (based on xpdf) and a number of rendering backends. In particular, it can extract text from PDF pages, create bitmaps from them, or convert PDF files to SWF.The latter functionality is similar to what is offered by pdf2swf utility, however more powerful.You can also create individual SWF files from single pages of the PDF
or mix pages from different PDF files.</tt>

Classes

[__builtin__.html#object __builtin__.object]: ;; [gfx.html#Doc Doc]

[gfx.html#Output Output] [gfx.html#Page Page]

class Doc([__builtin__.html#object __builtin__.object])

A [#Doc Doc] [__builtin__.html#object object] is used for storing a document (like a PDF). doc.pages contains the number of pages in the document, and doc.filename the name of the file the document was created (loaded) from. If the document was created from an image file, the number of pages is always 1

Methods defined here:

getInfo(...): [#Doc-getInfo getInfo](key) Retrieve some information about a document. For PDF files, key can have the following values: "title", "subject", "keywords", "author", "creator", "producer", "creationdate", "moddate", "linearized", "tagged", "encrypted", "oktoprint", "oktocopy", "oktochange", "oktoaddnotes", "version". If the "oktocopy" digital rights management flag is set to "no", then the pdf parser won't allow you to access the PDF file. Trying to extract pages from it will raise an exception.

getPage(...): [#Doc-getPage getPage](nr) Get one page from a document file. The nr parameter specifies which page to retrieve. Counting starts at 1, so the first page can be retrieved by page = doc.[#Doc-getPage getPage](1) . You can find out how many pages a document contains by querying its pages field (doc.pages)

setparameter(...): [#Doc-setparameter setparameter](key, value) Pass a parameter or setting to the document parser. Unlike the module level [#Doc-setparameter setparameter]() function, the parameters set using setparameter will only be valid for the [__builtin__.html#object object] itself during its lifetime.

class Output([__builtin__.html#object __builtin__.object])

An [#Output Output] [__builtin__.html#object object] can be used as parameter to the render() call of a page. It's not possible to create this type of [__builtin__.html#object object] directly (i.e., from a class), however you can use a [#-PassThrough PassThrough]() device to pass things over to Python. Examples for classes implementing the [#Output Output] class are: ImageList, SWF, PlainText and PassThrough.

Methods defined here:

endpage(...): [#Output-endpage endpage]() Ends a page in the output device. This function should be called once for every [#Output-startpage startpage]()

fill(...): [#Output-fill fill]() fill a polygon with a color

fillbitmap(...): [#Output-fillbitmap fillbitmap]() fill a polygon with a bitmap pattern

save(...): [#Output-save save](filename) Saves the contents of an output device to a file Depending on what the output device is, the contents of the file may be plain text, an image, an SWF file, etc. For the ImageList device, several files (named filename.1.png, filename.2.png etc.) might be created)

setparameter(...): [#Output-setparameter setparameter](key, value) Set a output-device dependent parameter

startpage(...): [#Output-startpage startpage](width, height) Starts a new page/frame in the output device. The usual way to render documents is to start a new page in the device for each page in the document: for pagenr in range(1,doc.pages+1): page = doc.getPage(pagenr) output.[#Output-startpage startpage](page.width, page.height) page.render(output) output.[#Output-endpage endpage]() It is, however, also possible to render more than one document page to a single output page. E.g. for side-by-side or book views.

stroke(...): [#Output-stroke stroke]() stroke a polygon with a color

class Page([__builtin__.html#object __builtin__.object])

A [#Page Page] [__builtin__.html#object object] contains a single page of a document. page.width and page.height (or page.size) contain the page dimensions. page.nr is the number of the page, and page.doc is the parent document.

Methods defined here:

asImage(...): [#Page-asImage asImage](width, height) Creates a bitmap from a page. The bitmap will be returned as a string containing RGB triplets. The bitmap will be rescaled to the specified width and height. The aspect ratio of width and height doesn't need to be the same as the page.

render(...): [#Page-render render](output, move=(0,0), clip=None) Renders a page to the rendering backend specified by the output parameter. Rendering consists of calling a number of functions on the output device, see the description of the "PassThrough" device. The page may be shifted to a given position using the move parameter, and may also be clipped to a specific size using the clip parameter. The clipping operation is applied after the move operation.

Functions

ImageList(...): [#-ImageList ImageList]() Creates a device which renders documents to bitmaps. Each page that is rendered will create new bitmap. Using save(), you can save the images to a number of files

OCR(...): [#-OCR OCR]() Creates a device which processes documents using OCR (optical character recognition). This is handy for e.g. extracting fulltext from PDF documents which have broken fonts, and where hence the "PlainText" device doesn't work.

OpenGL(...): [#-OpenGL OpenGL]() Creates a device which renders everything to OpenGL. Can be used for desktop display and debugging. This device is not available on all systems.

PassThrough(...): [#-PassThrough PassThrough](device) Creates a PassThrough device, which can be used as parameter in calls to page.render(). device needs to be a class implementing at least the following functions: [#-setparameter setparameter](key,value) startclip(outline) endclip() stroke(outline, width, color, capstyle, jointstyle, miterLimit) fill(outline, color) fillbitmap(outline, image, matrix, colortransform) fillgradient(outline, gradient, gradienttype, matrix) [#-addfont addfont](font) drawchar(font, glyph, color, matrix) drawlink(outline, url) If any of these functions are not defined, a error message will be printed, however the rendering process will *not* be aborted.

PlainText(...): [#-PlainText PlainText]() Creates a device which can be used to extract text from documents, by passing it as parameter to page.render(). The extracted text can be saved by plaintext.save(filename).

SWF(...): [#-SWF SWF]() Creates a device which renders documents to SWF (Flash) files. Depending on the way the document parser behaves (see the poly2bitmap and bitmap parameters), the resulting SWF might use vector operations and Flash Texts to display the document, or just a single bitmap.

addfont(...): [#-addfont addfont](filename) Passes an additional font file to the PDF parser. If a PDF contains external fonts (i.e. fonts which are not contained in the PDF itself) then the files added by [#-addfont addfont]() will be searched.

addfontdir(...): [#-addfontdir addfontdir](dirname) Passes a complete directory containing fonts to the PDF parser. Any font file within this directory might be used to resolve external fonts in PDF files

open(...): [#-open open](type, filename) -> [__builtin__.html#object object] Open a PDF, SWF or image file. The type argument should be "pdf", "swf" or "image" accordingly. It returns a doc [__builtin__.html#object object] which can be used to process the file contents. E.g. doc = [#-open open]("pdf", "document.pdf") doc = [#-open open]("swf", "flashfile.swf") doc = [#-open open]("image", "image.png") If the file could not be loaded, or is a encrypted PDF file without a proper password specified, an exception is being raised. If the filename argument contains a '|' char, everything behind the '|' is treated as password used for opening the file. E.g. doc = [#-open open]("pdf", "document.pdf|mysecretpassword") . Notice that for image files, the only supported file formats right now are jpeg and png.

setparameter(...): [#-setparameter setparameter](key,value) Set a parameter in the gfx module (which might affect the PDF parser or any of the rendering backends). This is a parameter which would usually be passed with the "-s" option to pdf2swf. For a list of all parameters, see the output of pdf2swf -s help and pdf2swf somefile.pdf -s help .

verbose(...): [#-verbose verbose](level) Set the logging verbosity of the gfx module. Log levels are: level=-1 Log nothing level=0 (fatal) Log only fatal errors level=1 (error) Log only fatal errors and errors level=2 (warn) Log all errors and warnings level=3 (notice) Log also some rudimentary data about the parsing/conversion level=4 (verbose) Log some additional parsing information level=5 (debug) Log debug statements level=6 (trace) Log extended debug statements All logging messages are written to stdout.