This project is read-only.
1
Vote

PdfDocumentReader unable to read PDF

description

Hi Kozw,

We are using DocumentDataSource in our project and trying to read a PDF document. Earlier we were using XPS document and everything except for range printing was working fine. In another requirement we are trying to read and display PDF document in Silver-Light.

Using class PdfDocumentReader of FirstFloor.Documents.Pdf for the same. We have mix of PDF documents with some having "Password Security" ( PDF -> Properties -> Security Tab ) and some having "No Security"
Unable to read the document as getting PdfParseExcception.

On debugging the latest document tool kit library (Document Toolkit Extensions 2.5.2.0 ) for investigating further, found the places where it is getting exceptions :

1) ParseDocument method in PdfParser.cs class at Encrypt.Get
2) DecodeFilter in FlateDecode.cs class in predictor == 1 case
3) XpsClientException
and many more.

Can you please help me to read all kinds of PDF documents. Does the existing library requires some code changes to support PDF's with different properties?

I am attaching PDF file for your reference. the first one with password security and later one with no security (casestudy_031103.pdf).
  • Varsha Hangire

file attachments

comments

varshaH wrote Apr 14, 2014 at 10:28 AM

For some PDF's getting following error - FirstFloor.Documents.IO.XpsClientException: Failed to load package part '4.fpage' ---> FirstFloor.Documents.Pdf.PdfParseException: Stream filter 'LZWDecode' is not supported

kozw wrote Apr 14, 2014 at 11:21 AM

The open source extension adding PDF support to Document Toolkit is incomplete. Only part of the PDF specification is implemented, password protected PDFs for instance are not supported.

There are currently no plans to further improve the PDF renderer. At its core Document Toolkit is a XPS reader, an implementation that is fairly complete. It is therefor recommended to use Document Toolkit for XPS.

varshaH wrote Apr 14, 2014 at 12:08 PM

Hi Kozw,

Thank you for your prompt reply. I was able to read PDF mentioned on following link, both in POC and in our SL application. Ref - https://documenttoolkit.codeplex.com/wikipage?title=How%20To%20View%20PDF%20Documents&referringTitle=Document%20Toolkit%20Tutorials

Actually, our clients requirement is to render documents with 500-1000+ pages and in this case XPS conversion is taking too much time and it is taken as performance issue by client. So we tried using PDF conversion as it takes comparatively very less time to convert even 1000+ document.

If there are limitations on using PDF can you please suggest some solution for XPS so that we can bring down the time of conversion by doing some buffering or on demand loading on scrolling.

This is our priority item and any help will be appreciated a lot.

Apart from this we are still stuck on the print issue and thinking to use JavaScript or loading document as embedded source in separate iFrame and then using native print API. Any further idea/guidance on this will be very helpful for us.

Thank you,
Varsha

varshaH wrote Apr 14, 2014 at 12:52 PM

The PDF document i attached in my first post is not a password protected document and if i compare properties of the document with the one you are using in your POC i.e. "TestDocument.Pdf" both are almost same.

kozw wrote Apr 14, 2014 at 11:56 PM

The opensource PDF renderer works best with OpenType fonts and jpeg images. If your PDF files contains these, you have the best chance of good render results.

Document Toolkit does include an on-demand load approach that works great with large XPS documents. The proposed solution includes a client and server logic, where the server sends parts of the document to client. See the [WebPackageReader2](https://documenttoolkit.codeplex.com/SourceControl/latest#2.5/Document Toolkit Extensions/Client/FirstFloor.Documents.IO/WebPackageReader2.cs) and the [GetDocumentPartHandler](https://documenttoolkit.codeplex.com/SourceControl/latest#2.5/Document Toolkit Extensions/Server/FirstFloor.Documents.Services/RequestHandlers/GetDocumentPartHandler.cs) for details.

As far as printing is concerned; the printing API does have its limitations and your suggestion for loading the document in an IFrame might very well work. I do not have experience with that approach.

varshaH wrote Apr 16, 2014 at 9:45 AM

Hi Kozw,

There are two things, one is document conversion e.g from PDF to XPS wherein PDF is converted into PNGs and all PNGs are bundled as single XPS document.

Second part is loading/viewing the XPS document.Tried using client server approach as proposed by you in above post.Time required to view big documents has reduced to some extend. But this has not solved our actual problem.

So for document conversion part is there any solution wherein XPS document is created on-demand basis and made ready for loading when first few PNGs are ready. Currently it takes 15-20 minutes to convert 1000+ pages PDF to XPS document. In the mean time session times out and conversion fails.

Varsha

varshaH wrote Apr 16, 2014 at 10:34 AM

Kozw We are using our own conversion services to convert the documents into XPS. So I am re-framing my question as - Is there any conversion mechanism which document toolkit provides? If it is there does it support all types of documents like PDF, office, TIFFs etc?

kozw wrote Apr 16, 2014 at 5:16 PM

The PDF reader of Document Toolkit implements an in-memory conversion from PDF to XPS. Document Toolkit only understands XPS, and the PDF renderer provides the XPS parts based on the PDF file. Same for Tiff files, a Tiff reader exists that converts Tiff images to XPS documents. Again, this is done in-memory.

For MS Office, Document Toolkit uses the save-to-xps feature included in Office. This requires elevated permission for the app since Document Toolkit uses COM automation to communicate with Office APIs.

All non-XPS readers are available as open source in this Document Toolkit Extensions project.