Converting HTML to PDF: 6 Powerful Java Solutions for Developers

Converting HTML to PDF

Converting HTML files to PDF format is a common requirement in many web applications. Whether you need to generate reports, invoices, or documentation, transforming web content into portable, print-friendly PDF documents provides significant advantages. This guide explores six proven Java-based solutions for HTML to PDF conversion, with practical code examples and insights to help you implement the right approach for your project.

Table of Contents

  • Why Convert HTML to PDF?
  • Understanding the Challenges
  • Solution 1: iText Library
  • Solution 2: Flying Saucer with OpenPDF
  • Solution 3: wkhtmltopdf with Java wrapper
  • Solution 4: CSS2XSLFO with Apache FOP
  • Solution 5: JRex (Java API for Gecko)
  • Solution 6: PDFKit with JNI
  • Performance Comparison
  • Implementation Best Practices
  • Conclusion

Why Convert HTML to PDF?

Converting HTML to PDF combines the flexibility of web content creation with the reliability of PDF document distribution. PDFs maintain consistent formatting across devices, are printer-friendly, and provide better document security. For Java developers working with web applications, implementing HTML to PDF conversion enables automated report generation, digital document signing, and archiving capabilities.

Understanding the Challenges

HTML to PDF conversion presents several technical challenges:

  • Rendering consistency: Ensuring the PDF looks identical to the HTML page
  • Table layouts: Maintaining proper table structure and preventing content overflow
  • CSS support: Handling various CSS properties, especially complex layouts
  • Performance: Processing large documents efficiently
  • Font embedding: Managing fonts correctly in the resulting PDF

Let’s explore how different Java solutions address these challenges.

Solution 1: iText Library

iText is a widely-used Java PDF library that offers HTML to PDF conversion capabilities. While newer versions require a commercial license for production use, it remains one of the most robust options.

import com.itextpdf.text.Document;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;

public class ITextHtmlToPdf {
    public static void convertHtmlToPdf(String html, OutputStream outputStream) throws Exception {
        Document document = new Document(PageSize.A4);
        PdfWriter writer = PdfWriter.getInstance(document, outputStream);
        document.open();
        
        XMLWorkerHelper.getInstance().parseXHtml(writer, document, 
            new StringReader(html));
            
        document.close();
    }
    
    public static void main(String[] args) {
        try {
            String html = "<html><body><h1>Test Document</h1>" +
                          "<table><tr><td>Cell 1</td><td>Cell 2</td></tr></table></body></html>";
            FileOutputStream output = new FileOutputStream("output.pdf");
            convertHtmlToPdf(html, output);
            output.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Pros:

  • Mature library with extensive documentation
  • Good table handling capabilities
  • Extensive PDF manipulation features
  • Well-maintained with regular updates

Cons:

  • Limited CSS support in the free version
  • Commercial license required for newer versions
  • Learning curve for advanced features

Solution 2: Flying Saucer with OpenPDF

Flying Saucer (also known as XHTML renderer) combined with OpenPDF offers excellent rendering quality, especially for document-centric HTML content.

import org.xhtmlrenderer.pdf.ITextRenderer;

public class FlyingSaucerExample {
    public static void convertToPdf(String html, OutputStream outputStream) throws Exception {
        ITextRenderer renderer = new ITextRenderer();
        renderer.setDocumentFromString(html);
        renderer.layout();
        renderer.createPDF(outputStream);
    }
    
    public static void main(String[] args) {
        try {
            String html = "<html><head><style>table { border-collapse: collapse; }</style></head>" +
                         "<body><h1>Test Document</h1>" +
                         "<table border='1'><tr><td>Cell 1</td><td>Cell 2</td></tr></table></body></html>";
            FileOutputStream output = new FileOutputStream("flying-saucer-output.pdf");
            convertToPdf(html, output);
            output.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Pros:

  • Better CSS support than basic iText
  • Open-source and free for commercial use
  • Good rendering quality for document-like content
  • Supports CSS2.1 spec (mostly)

Cons:

  • Limited support for modern CSS features
  • No JavaScript support
  • May have issues with complex layouts

Solution 3: wkhtmltopdf with Java wrapper

wkhtmltopdf is a command-line tool that uses the WebKit rendering engine. By creating a Java wrapper around this tool, you can leverage its excellent rendering capabilities.

public class WkHtmlToPdfWrapper {
    public static void convertHtmlToPdf(String html, String outputPath) throws Exception {
        // Save HTML to temp file
        File htmlFile = File.createTempFile("input", ".html");
        try (FileWriter writer = new FileWriter(htmlFile)) {
            writer.write(html);
        }
        
        // Build process
        ProcessBuilder processBuilder = new ProcessBuilder(
            "wkhtmltopdf",
            htmlFile.getAbsolutePath(),
            outputPath
        );
        
        // Execute and wait
        Process process = processBuilder.start();
        int exitCode = process.waitFor();
        
        // Clean up
        htmlFile.delete();
        
        if (exitCode != 0) {
            throw new RuntimeException("wkhtmltopdf process failed with exit code: " + exitCode);
        }
    }
    
    public static void main(String[] args) {
        try {
            String html = "<html><body><h1>WebKit Test</h1><table><tr><td>Cell 1</td></tr></table></body></html>";
            convertHtmlToPdf(html, "wkhtmltopdf-output.pdf");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Pros:

  • WebKit rendering engine provides excellent HTML/CSS support
  • Handles complex layouts well
  • Good table support
  • JavaScript execution capability

Cons:

  • Requires external binary installation
  • Potential security concerns with executing external processes
  • Memory-intensive for large documents

Solution 4: CSS2XSLFO with Apache FOP

This approach converts HTML/CSS to XSL-FO format, then uses Apache FOP to generate the PDF. While mentioned in the question as producing suboptimal results with tables, it can be optimized with proper configuration.

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.apache.fop.apps.FopFactory;
import org.apache.fop.apps.Fop;
import org.apache.fop.apps.MimeConstants;

public class CSS2XSLFOExample {
    public static void convertToPdf(String htmlFile, String outputPdf) throws Exception {
        // Step 1: Convert HTML/CSS to XSL-FO using XSLT
        TransformerFactory factory = TransformerFactory.newInstance();
        Transformer transformer = factory.newTransformer(
            new StreamSource(new File("html2fo.xsl")));
            
        File foFile = File.createTempFile("output", ".fo");
        transformer.transform(
            new StreamSource(new StringReader(htmlFile)),
            new StreamResult(foFile));
            
        // Step 2: Convert XSL-FO to PDF using Apache FOP
        FopFactory fopFactory = FopFactory.newInstance(new File(".").toURI());
        OutputStream out = new BufferedOutputStream(new FileOutputStream(outputPdf));
        
        try {
            Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, out);
            transformer = factory.newTransformer();
            transformer.transform(new StreamSource(foFile), new SAXResult(fop.getDefaultHandler()));
        } finally {
            out.close();
            foFile.delete();
        }
    }
}

Pros:

  • Complete separation of content and formatting
  • Highly customizable output
  • Strong typographic control

Cons:

  • Complex implementation
  • Table layout issues (as noted in the question)
  • Steep learning curve for XSL-FO

Solution 5: JRex (Java API for Gecko)

JRex provides a Java wrapper around Mozilla’s Gecko rendering engine, offering another approach for converting HTML to PDF.

// Note: JRex implementation example - conceptual code
import org.mozilla.jrex.JRex;

public class JRexExample {
    public static void convertToPdf(String html, String outputPath) {
        JRex jrex = new JRex(); // Initialize the Gecko engine
        jrex.render(html);      // Render the HTML content
        jrex.saveAsPdf(outputPath); // Save as PDF
        jrex.close();           // Clean up resources
    }
}

Pros:

  • Good rendering quality with Gecko engine
  • Supports modern web standards
  • JavaScript execution capability

Cons:

  • Complex setup and dependencies
  • Less mainstream than other solutions
  • Limited documentation and examples

Solution 6: PDFKit with JNI

For developers willing to use JNI (Java Native Interface), PDFKit provides another option for HTML to PDF conversion.

// Conceptual code for PDFKit with JNI
public class PDFKitExample {
    static {
        System.loadLibrary("pdfkit_jni"); // Load native library
    }
    
    private native void convertHtmlToPdfNative(String html, String outputPath);
    
    public void convertHtmlToPdf(String html, String outputPath) {
        convertHtmlToPdfNative(html, outputPath);
    }
}

Pros:

  • Fast rendering performance
  • Good CSS support
  • Modern browser rendering capability

Cons:

  • Platform-specific implementation
  • JNI complexity
  • Maintenance overhead

Performance Comparison

When evaluating HTML to PDF solutions, consider these performance metrics:

SolutionRendering QualitySpeedMemory UsageCSS Support
iTextGoodFastLowBasic
Flying SaucerVery GoodMediumMediumGood
wkhtmltopdfExcellentMediumHighExcellent
CSS2XSLFOFairSlowMediumLimited
JRexGoodMediumHighVery Good
PDFKitVery GoodFastMediumExcellent

Implementation Best Practices

For successful HTML to PDF conversion in Java applications:

  1. Use semantic HTML: Clean, well-structured HTML converts more reliably
  2. Simplify CSS: Avoid advanced CSS features that might not be supported
  3. Test thoroughly: Check rendering across different content types
  4. Consider performance: For high-volume applications, benchmark different solutions
  5. Plan for maintenance: Choose libraries with active development communities

Conclusion

Converting HTML to PDF in Java applications offers several viable approaches, each with its own strengths and limitations. For simple document-based conversions with tables, Flying Saucer with OpenPDF provides an excellent balance of quality and ease of implementation. For complex web pages with modern CSS, wkhtmltopdf with a Java wrapper offers superior rendering fidelity.

The best solution depends on your specific requirements, including rendering quality needs, performance constraints, and licensing considerations. Start with a thorough evaluation of your document complexity and rendering requirements before selecting a conversion approach.

Have you implemented HTML to PDF conversion in your Java applications?

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *