Converting HTML To PDF: 6 Powerful Java Solutions For Developers

Converting HTML files to PDF format is a common requirement in many web applications. Whether you need to generate reports, invoices, or documentation, transforming web content into portable, print-friendly PDF documents provides significant advantages. This guide explores six proven Java-based solutions for HTML to PDF conversion, with practical code examples and insights to help you implement the right approach for your project.

Why Convert HTML to PDF?
Understanding the Challenges
Solution 1: iText Library
Solution 2: Flying Saucer with OpenPDF
Solution 3: wkhtmltopdf with Java wrapper
Solution 4: CSS2XSLFO with Apache FOP
Solution 5: JRex (Java API for Gecko)
Solution 6: PDFKit with JNI
Performance Comparison
Implementation Best Practices
Conclusion

Why Convert HTML to PDF?

Converting HTML to PDF combines the flexibility of web content creation with the reliability of PDF document distribution. PDFs maintain consistent formatting across devices, are printer-friendly, and provide better document security. For Java developers working with web applications, implementing HTML to PDF conversion enables automated report generation, digital document signing, and archiving capabilities.

Understanding the Challenges

HTML to PDF conversion presents several technical challenges:

Rendering consistency: Ensuring the PDF looks identical to the HTML page
Table layouts: Maintaining proper table structure and preventing content overflow
CSS support: Handling various CSS properties, especially complex layouts
Performance: Processing large documents efficiently
Font embedding: Managing fonts correctly in the resulting PDF

Let’s explore how different Java solutions address these challenges.

Solution 1: iText Library

iText is a widely-used Java PDF library that offers HTML to PDF conversion capabilities. While newer versions require a commercial license for production use, it remains one of the most robust options.

import com.itextpdf.text.Document;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;

public class ITextHtmlToPdf {
    public static void convertHtmlToPdf(String html, OutputStream outputStream) throws Exception {
        Document document = new Document(PageSize.A4);
        PdfWriter writer = PdfWriter.getInstance(document, outputStream);
        document.open();
        
        XMLWorkerHelper.getInstance().parseXHtml(writer, document, 
            new StringReader(html));
            
        document.close();
    }
    
    public static void main(String[] args) {
        try {
            String html = "<html><body><h1>Test Document</h1>" +
                          "<table><tr><td>Cell 1</td><td>Cell 2</td></tr></table></body></html>";
            FileOutputStream output = new FileOutputStream("output.pdf");
            convertHtmlToPdf(html, output);
            output.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Pros:

Mature library with extensive documentation
Good table handling capabilities
Extensive PDF manipulation features
Well-maintained with regular updates

Cons:

Limited CSS support in the free version
Commercial license required for newer versions
Learning curve for advanced features

Solution 2: Flying Saucer with OpenPDF

Flying Saucer (also known as XHTML renderer) combined with OpenPDF offers excellent rendering quality, especially for document-centric HTML content.

import org.xhtmlrenderer.pdf.ITextRenderer;

public class FlyingSaucerExample {
    public static void convertToPdf(String html, OutputStream outputStream) throws Exception {
        ITextRenderer renderer = new ITextRenderer();
        renderer.setDocumentFromString(html);
        renderer.layout();
        renderer.createPDF(outputStream);
    }
    
    public static void main(String[] args) {
        try {
            String html = "<html><head><style>table { border-collapse: collapse; }</style></head>" +
                         "<body><h1>Test Document</h1>" +
                         "<table border='1'><tr><td>Cell 1</td><td>Cell 2</td></tr></table></body></html>";
            FileOutputStream output = new FileOutputStream("flying-saucer-output.pdf");
            convertToPdf(html, output);
            output.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Pros:

Better CSS support than basic iText
Open-source and free for commercial use
Good rendering quality for document-like content
Supports CSS2.1 spec (mostly)

Cons:

Limited support for modern CSS features
No JavaScript support
May have issues with complex layouts

Solution 3: wkhtmltopdf with Java wrapper

wkhtmltopdf is a command-line tool that uses the WebKit rendering engine. By creating a Java wrapper around this tool, you can leverage its excellent rendering capabilities.

public class WkHtmlToPdfWrapper {
    public static void convertHtmlToPdf(String html, String outputPath) throws Exception {
        // Save HTML to temp file
        File htmlFile = File.createTempFile("input", ".html");
        try (FileWriter writer = new FileWriter(htmlFile)) {
            writer.write(html);
        }
        
        // Build process
        ProcessBuilder processBuilder = new ProcessBuilder(
            "wkhtmltopdf",
            htmlFile.getAbsolutePath(),
            outputPath
        );
        
        // Execute and wait
        Process process = processBuilder.start();
        int exitCode = process.waitFor();
        
        // Clean up
        htmlFile.delete();
        
        if (exitCode != 0) {
            throw new RuntimeException("wkhtmltopdf process failed with exit code: " + exitCode);
        }
    }
    
    public static void main(String[] args) {
        try {
            String html = "<html><body><h1>WebKit Test</h1><table><tr><td>Cell 1</td></tr></table></body></html>";
            convertHtmlToPdf(html, "wkhtmltopdf-output.pdf");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Pros:

WebKit rendering engine provides excellent HTML/CSS support
Handles complex layouts well
Good table support
JavaScript execution capability

Cons:

Requires external binary installation
Potential security concerns with executing external processes
Memory-intensive for large documents

Solution 4: CSS2XSLFO with Apache FOP

This approach converts HTML/CSS to XSL-FO format, then uses Apache FOP to generate the PDF. While mentioned in the question as producing suboptimal results with tables, it can be optimized with proper configuration.

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.apache.fop.apps.FopFactory;
import org.apache.fop.apps.Fop;
import org.apache.fop.apps.MimeConstants;

public class CSS2XSLFOExample {
    public static void convertToPdf(String htmlFile, String outputPdf) throws Exception {
        // Step 1: Convert HTML/CSS to XSL-FO using XSLT
        TransformerFactory factory = TransformerFactory.newInstance();
        Transformer transformer = factory.newTransformer(
            new StreamSource(new File("html2fo.xsl")));
            
        File foFile = File.createTempFile("output", ".fo");
        transformer.transform(
            new StreamSource(new StringReader(htmlFile)),
            new StreamResult(foFile));
            
        // Step 2: Convert XSL-FO to PDF using Apache FOP
        FopFactory fopFactory = FopFactory.newInstance(new File(".").toURI());
        OutputStream out = new BufferedOutputStream(new FileOutputStream(outputPdf));
        
        try {
            Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, out);
            transformer = factory.newTransformer();
            transformer.transform(new StreamSource(foFile), new SAXResult(fop.getDefaultHandler()));
        } finally {
            out.close();
            foFile.delete();
        }
    }
}

Pros:

Complete separation of content and formatting
Highly customizable output
Strong typographic control

Cons:

Complex implementation
Table layout issues (as noted in the question)
Steep learning curve for XSL-FO

Solution 5: JRex (Java API for Gecko)

JRex provides a Java wrapper around Mozilla’s Gecko rendering engine, offering another approach for converting HTML to PDF.

// Note: JRex implementation example - conceptual code
import org.mozilla.jrex.JRex;

public class JRexExample {
    public static void convertToPdf(String html, String outputPath) {
        JRex jrex = new JRex(); // Initialize the Gecko engine
        jrex.render(html);      // Render the HTML content
        jrex.saveAsPdf(outputPath); // Save as PDF
        jrex.close();           // Clean up resources
    }
}

Pros:

Good rendering quality with Gecko engine
Supports modern web standards
JavaScript execution capability

Cons:

Complex setup and dependencies
Less mainstream than other solutions
Limited documentation and examples

Solution 6: PDFKit with JNI

For developers willing to use JNI (Java Native Interface), PDFKit provides another option for HTML to PDF conversion.

// Conceptual code for PDFKit with JNI
public class PDFKitExample {
    static {
        System.loadLibrary("pdfkit_jni"); // Load native library
    }
    
    private native void convertHtmlToPdfNative(String html, String outputPath);
    
    public void convertHtmlToPdf(String html, String outputPath) {
        convertHtmlToPdfNative(html, outputPath);
    }
}

Pros:

Fast rendering performance
Good CSS support
Modern browser rendering capability

Cons:

Platform-specific implementation
JNI complexity
Maintenance overhead

Performance Comparison

When evaluating HTML to PDF solutions, consider these performance metrics:

Solution	Rendering Quality	Speed	Memory Usage	CSS Support
iText	Good	Fast	Low	Basic
Flying Saucer	Very Good	Medium	Medium	Good
wkhtmltopdf	Excellent	Medium	High	Excellent
CSS2XSLFO	Fair	Slow	Medium	Limited
JRex	Good	Medium	High	Very Good
PDFKit	Very Good	Fast	Medium	Excellent

Implementation Best Practices

For successful HTML to PDF conversion in Java applications:

Use semantic HTML: Clean, well-structured HTML converts more reliably
Simplify CSS: Avoid advanced CSS features that might not be supported
Test thoroughly: Check rendering across different content types
Consider performance: For high-volume applications, benchmark different solutions
Plan for maintenance: Choose libraries with active development communities

Conclusion

Converting HTML to PDF in Java applications offers several viable approaches, each with its own strengths and limitations. For simple document-based conversions with tables, Flying Saucer with OpenPDF provides an excellent balance of quality and ease of implementation. For complex web pages with modern CSS, wkhtmltopdf with a Java wrapper offers superior rendering fidelity.

The best solution depends on your specific requirements, including rendering quality needs, performance constraints, and licensing considerations. Start with a thorough evaluation of your document complexity and rendering requirements before selecting a conversion approach.

Have you implemented HTML to PDF conversion in your Java applications?

Converting HTML to PDF: 6 Powerful Java Solutions for Developers

Table of Contents

Why Convert HTML to PDF?

Understanding the Challenges

Solution 1: iText Library

Solution 2: Flying Saucer with OpenPDF

Solution 3: wkhtmltopdf with Java wrapper

Solution 4: CSS2XSLFO with Apache FOP

Solution 5: JRex (Java API for Gecko)

Solution 6: PDFKit with JNI

Performance Comparison

Implementation Best Practices

Conclusion

Comments

Leave a Reply Cancel reply