com.etymon.pjx
Class PdfReader

java.lang.Object
  extended by com.etymon.pjx.PdfReader

public final class PdfReader
extends java.lang.Object

Reads a PDF document. Most applications do not need to access methods in this class but should instead go through PdfManager. This class is synchronized.


Nested Class Summary
protected  class PdfReader.ArrayEnd
          A placeholder used by the PDF parser to mark the end of an array.
protected  class PdfReader.DictionaryEnd
          A placeholder used by the PDF parser to mark the end of a dictionary.
protected  class PdfReader.DictionaryEndStream
          A placeholder used by the PDF parser to mark the end of a dictionary that is also followed by a stream.
protected  class PdfReader.ParserObject
          The superclass of inner classes used by this PdfReader to mark positions while parsing PDF objects.
 
Field Summary
protected static java.util.regex.Pattern _patHeader
          The regular expression that matches a PDF header.
protected static java.util.regex.Pattern _patObjIntro
          The regular expression that matches the begining of an indirect object (specifically, the object number and generation number followed by "obj").
protected static java.util.regex.Pattern _patPdfObject
          The regular expression that matches a PDF (direct) object.
protected static java.util.regex.Pattern _patStartxref
          The regular expression that matches a startxref section.
protected static java.util.regex.Pattern _patXref
          The regular expression that matches the beginning of an xref section (specifically, the "xref" key word).
protected static java.util.regex.Pattern _patXrefEof
          The regular expression that matches an entire xref table section, including the "trailer" key word.
protected static java.util.regex.Pattern _patXrefSub
          The regular expression that matches the introduction to a subsection of an xref section (specifically, an integer pair) or the "trailer" key word.
protected static java.util.regex.Pattern _patXrefTable
          The regular expression that matches an entire xref table section, including the "trailer" key word.
protected  PdfInput _pdfInput
           
protected static PdfName PDFNAME_LENGTH
          A PdfName object representing the name Length.
protected static PdfName PDFNAME_PREV
          A PdfName object representing the name Prev.
protected static PdfName PDFNAME_SIZE
          A PdfName object representing the name Size.
protected static java.lang.String REGEX_ANY_CHAR
          The regular expression that matches literally any character.
protected static java.lang.String REGEX_COMMENT
          The regular expression that matches a comment in PDF.
protected static java.lang.String REGEX_DELIMITER
          The regular expression that matches a delimiter in PDF.
protected static java.lang.String REGEX_EOL
          The regular expression that matches an end-of-line (EOL) marker in PDF.
protected static java.lang.String REGEX_REGULAR
          The regular expression that matches a regular character in PDF.
protected static java.lang.String REGEX_STOP
          The regular expression that matches a white-space or delimiter (stopping syntactic entities) in PDF.
protected static java.lang.String REGEX_WHITESPACE
          The regular expression that matches general white-space in PDF.
protected static int STARTXREF_RETRY_COUNT
          Number of times to try scanning for startxref.
protected static int STARTXREF_RETRY_SCAN
          The number of bytes from the end of a PDF document at which to start scanning for startxref.
 
Constructor Summary
PdfReader(PdfInput pdfInput)
          Creates a reader for a PDF document to be read from a PdfInput source.
 
Method Summary
 void close()
          Closes the PDF document and releases any system resources associated with it.
 PdfInput getInput()
          Returns the PdfInput instance associated with this document.
protected  PdfInput getPdfInput()
           
protected  PdfObject parseObject(long start, long end, java.nio.CharBuffer cbuf, XrefTable xt)
          Parses and returns a PDF object from the input source.
 java.lang.String readHeader()
          Reads the header of the PDF document.
 PdfObject readObject(long start, long end, boolean indirect, XrefTable xt)
          Reads a PDF object from the document.
protected  XrefTable readPartialXrefTable(XrefTable xt, long startxref, long[] prev)
          Reads an individual (partial) cross-reference table and trailer dictionary from the PDF document.
 long readStartxref()
          Reads the startxref value from the PDF document.
 XrefTable readXrefTable(long startxref)
          Reads and compiles all cross-reference tables and trailer dictionaries from the PDF document beginning at a specified position.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_pdfInput

protected PdfInput _pdfInput

_patHeader

protected static java.util.regex.Pattern _patHeader
The regular expression that matches a PDF header.


_patObjIntro

protected static final java.util.regex.Pattern _patObjIntro
The regular expression that matches the begining of an indirect object (specifically, the object number and generation number followed by "obj").


_patPdfObject

protected static final java.util.regex.Pattern _patPdfObject
The regular expression that matches a PDF (direct) object.


_patStartxref

protected static final java.util.regex.Pattern _patStartxref
The regular expression that matches a startxref section.


_patXref

protected static final java.util.regex.Pattern _patXref
The regular expression that matches the beginning of an xref section (specifically, the "xref" key word).


_patXrefSub

protected static final java.util.regex.Pattern _patXrefSub
The regular expression that matches the introduction to a subsection of an xref section (specifically, an integer pair) or the "trailer" key word.


_patXrefTable

protected static final java.util.regex.Pattern _patXrefTable
The regular expression that matches an entire xref table section, including the "trailer" key word.


_patXrefEof

protected static final java.util.regex.Pattern _patXrefEof
The regular expression that matches an entire xref table section, including the "trailer" key word.


PDFNAME_LENGTH

protected static final PdfName PDFNAME_LENGTH
A PdfName object representing the name Length.


PDFNAME_PREV

protected static final PdfName PDFNAME_PREV
A PdfName object representing the name Prev.


PDFNAME_SIZE

protected static final PdfName PDFNAME_SIZE
A PdfName object representing the name Size.


REGEX_ANY_CHAR

protected static final java.lang.String REGEX_ANY_CHAR
The regular expression that matches literally any character.

See Also:
Constant Field Values

REGEX_COMMENT

protected static final java.lang.String REGEX_COMMENT
The regular expression that matches a comment in PDF.

See Also:
Constant Field Values

REGEX_DELIMITER

protected static final java.lang.String REGEX_DELIMITER
The regular expression that matches a delimiter in PDF.

See Also:
Constant Field Values

REGEX_EOL

protected static final java.lang.String REGEX_EOL
The regular expression that matches an end-of-line (EOL) marker in PDF.

See Also:
Constant Field Values

REGEX_REGULAR

protected static final java.lang.String REGEX_REGULAR
The regular expression that matches a regular character in PDF.

See Also:
Constant Field Values

REGEX_STOP

protected static final java.lang.String REGEX_STOP
The regular expression that matches a white-space or delimiter (stopping syntactic entities) in PDF.

See Also:
Constant Field Values

REGEX_WHITESPACE

protected static final java.lang.String REGEX_WHITESPACE
The regular expression that matches general white-space in PDF.

See Also:
Constant Field Values

STARTXREF_RETRY_COUNT

protected static final int STARTXREF_RETRY_COUNT
Number of times to try scanning for startxref. Each time the parser will back up to a point (STARTXREF_RETRY_SCAN) bytes before the previous time.

See Also:
Constant Field Values

STARTXREF_RETRY_SCAN

protected static final int STARTXREF_RETRY_SCAN
The number of bytes from the end of a PDF document at which to start scanning for startxref.

See Also:
Constant Field Values
Constructor Detail

PdfReader

public PdfReader(PdfInput pdfInput)
Creates a reader for a PDF document to be read from a PdfInput source.

Parameters:
pdfInput - the source to read the PDF document from.
Method Detail

getInput

public PdfInput getInput()
Returns the PdfInput instance associated with this document.


getPdfInput

protected PdfInput getPdfInput()

close

public void close()
           throws java.io.IOException
Closes the PDF document and releases any system resources associated with it.

Throws:
java.io.IOException

parseObject

protected PdfObject parseObject(long start,
                                long end,
                                java.nio.CharBuffer cbuf,
                                XrefTable xt)
                         throws java.io.IOException,
                                PdfFormatException
Parses and returns a PDF object from the input source. The object is filtered through PdfReaderFilter. It is possible for this method to return null if the filtering method discards all objects. This method is intended to be called from readObject() which advanced the buffer position past introduction if the object is indirect.

Parameters:
start - the offset where the object starts.
end - the offset where the object ends.
cbuf - the character buffer cached from readObject().
xt - the cross-reference table; used for resolving indirect references.
Throws:
PdfFormatException
java.io.IOException

readPartialXrefTable

protected XrefTable readPartialXrefTable(XrefTable xt,
                                         long startxref,
                                         long[] prev)
                                  throws java.io.IOException,
                                         PdfFormatException
Reads an individual (partial) cross-reference table and trailer dictionary from the PDF document. The trailer dictionary is filtered through PdfReaderFilter. This method should be made public.

Parameters:
xrefTrailer - an existing xrefTrailer object to add data to; assumed to be the "subsequent" to the new XrefTrailer that is to be read. Only non-existing entries are modified. The trailer is not modified.
startxref - the xref start position.
filter - the filter.
prev - the current Prev offset.
Returns:
the cross-reference table and trailer.
Throws:
java.io.IOException
PdfFormatException

readHeader

public java.lang.String readHeader()
                            throws java.io.IOException,
                                   PdfException
Reads the header of the PDF document.

Returns:
the PDF document header.
Throws:
java.io.IOException
PdfException

readObject

public PdfObject readObject(long start,
                            long end,
                            boolean indirect,
                            XrefTable xt)
                     throws java.io.IOException,
                            PdfFormatException
Reads a PDF object from the document. The object is filtered through PdfReaderFilter. It is possible for this method to return null if the filtering method discards all objects.

Parameters:
start - the offset where the object starts.
end - the offset where the object ends.
indirect - true if the object is preceded by the object number, generation, and "obj".
xt - the PDF document's cross-reference table.
filter - the object filter.
Returns:
the PDF object.
Throws:
java.io.IOException
PdfFormatException

readStartxref

public long readStartxref()
                   throws java.io.IOException,
                          PdfFormatException
Reads the startxref value from the PDF document.

Returns:
the startxref value.
Throws:
java.io.IOException
PdfFormatException

readXrefTable

public XrefTable readXrefTable(long startxref)
                        throws java.io.IOException,
                               PdfFormatException
Reads and compiles all cross-reference tables and trailer dictionaries from the PDF document beginning at a specified position. The most recent trailer dictionary is filtered through PdfReaderFilter.

Parameters:
startxref - the xref start position.
filter - the filter.
Returns:
the cross-reference table and trailer.
Throws:
java.io.IOException
PdfFormatException