DynaPDF Manual - Page 472

Previous Page 471   Index   Next Page 473

Function Reference
Page 472 of 818
character. As long as the text is not rotated it is relatively easy to determine whether a text record
lies on the same y-axis, but finding an arbitrary rotated text that is also stored in several different
text objects requires further math.
The position of a text object is calculated from the two transformation matrices ctm and tm. The
global transformation matrix ctm represents the current coordinate system when a text showing
operator was found. The matrix ctm is already pre-multiplied because GetPageText() does not return
when a new transformation matrix is applied.
The text transformation matrix tm represents the text coordinate system in which text properties
such as text width, font size, character spacing, word spacing, or the space width are calculated. All
text positioning operators are already included in this matrix.
The combination of both matrices represents the final user space in which the text is rendered. Both
matrices must be combined to enable the calculation of the text position and orientation (see the
examples on the following pages to determine how the matrices must be combined).
Organization of content streams and pages
A PDF page consists of a content stream and a resource array which contains the resources such as
fonts, images, and so on which can be used by the page. The content stream contains the PDF
operators which paint the contents of a PDF page.
The PDF format supports two object types which support vector graphics and images: ordinary
pages and the so called "Form XObjects" which act as a template (we call this object type template
here). A template consists in turn of a content stream and a resource array like a page object and it is
possible to convert a page to a template. A page object can display an arbitrary number of templates
and a template can in turn display arbitrary other templates. It is important to understand that the
content of a template is physically stored in another content stream because the function InitStack()
prepares only the currently open content stream of the page or template for editing.
Only this content stream can be parsed and edited. Templates which occur in a page or other
template must be parsed separately. Because templates can contain other templates it is usually best
to parse templates recursively.
However, if texts must be deleted or replaced you must make sure that a template is not edited
twice if it occurs in another page or template. Such a duplicate check is strongly required and it
must be applied every time a template should be processed.
Whether a page or template contains templates can be determined with GetTemplCount(). Such
subsequent templates can then be opened for editing with EditTemplate(). When finish, the template
must be closed with EndTemplate().
Organization of text objects
A text object consists of a transformation matrix and the text. Several other properties are taken from
the current graphics state such as the font, font size, character spacing, word spacing and so on.

Previous topic: External CMaps, Order of Text records

Next topic: Possible encoding issues

Start Chat