Skip to content

A standalone Java library/command line tool that converts DOC, DOCX, PPT, PPTX and ODT documents to PDF files.

License

Notifications You must be signed in to change notification settings

maveninventing/docs-to-pdf-converter

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Docs to PDF Converter

A standalone Java library/command line tool that converts DOC, DOCX, PPT, PPTX and ODT documents to pdf files. (Requires JRE 6)

Why?
I wanted a simple program that can convert Microsoft Office documents to PDF but without dependencies like LibreOffice or expensive proprietary solutions. Seeing as how code and libraries to convert each individual format is scattered around the web, I decided to combine all those solutions into one single program. Along the way, I decided to add ODT support as well since I encountered the code too.

Command Line Usage:

java -jar doc-converter.jar -type "type" -input "path" -output "path" -verbose

java -jar doc-converter.jar -input test.doc
java -jar doc-converter.jar -i test.ppt -o ~\output.pdf
java -jar doc-converter.jar -i ~\no-extension-file -o ~\output.pdf -t docx

Parameters:

-inputPath (-i, -in, -input) "path" : specifies a path for the input file
 
-outputPath (-o, -out, -output) "path" : specifies a path for the output PDF, use input file directory and name.pdf if not specified (Optional)

-type (-t) [DOC | DOCX | PPT | PPTX | ODT] : Specifies doc converter. Leave blank to let program infer via file  extension (Optional)

-verbose (-v) : To view intermediate processing messages. (Optional)

Library Usage:

  1. Drop the jar into your lib folder and add to build path.
  2. Choose the converter of your choice, they are named DocToPDFConverter, DocxToPDFConverter, PptToPDFConverter, PptxToPDFConverter and OdtToPDFConverter.
  3. Instantiate with 4 parameters
  4. 3a: InputStream inStream: Document source stream to be converted
    3b: OutputStream outStream: Document output stream
    3c: boolean showMessages: Whether to show intermediate processing messages to Standard Out (stdout)
    3d: boolean closeStreamsWhenComplete: Whether to close input and output streams when complete
  5. Call the "convert()" method and wait.

    Caveats and technical details:
    This tool relies on Apache POI, xdocreport, docx4j and odfdom libraries. They are not 100% reliable and the output format may not always be what you desire.

    DOC:
    Generally ok but takes some time to convert.. I notice that after conversion, the paragraph spacing tends to increase affecting your page layout. Conversion is done using docx4j to convert DOC to DOCX then to PDF.
    (Cannot use xdocreport once the DOCX data is obtained as the intermediate data structure is docx4j specific.)

    DOCX:
    Very good results. Fast conversion too. Conversion is done using xdocreport library as it seems faster and more accurate than docx4j.

    PPT and PPTX:
    Resulting file is a PDF comprising of a PNG embedded in each page. Should be good enough for printing. This is the limitation of the Apache POI and docx4j libraries.

    ODT:
    Quality and speed as good as DOCX. Conversion is done using odfdom of the Apache ODF Toolkit.

    Main Libraries
    Apache POI: https://poi.apache.org/
    xdocreport: http://code.google.com/p/xdocreport/
    docx4j: http://www.docx4java.org/
    odfdom: https://incubator.apache.org/odftoolkit/odfdom/
    and others...

    The MIT License (MIT)
    Copyright (c) 2013-2014 Yeo Kheng Meng

About

A standalone Java library/command line tool that converts DOC, DOCX, PPT, PPTX and ODT documents to PDF files.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%