Converting HTML files to PDF format is a common requirement in many web applications. Whether you need to generate reports, invoices, or documentation, transforming web content into portable, print-friendly PDF documents provides significant advantages. This guide explores six proven Java-based solutions for HTML to PDF conversion, with practical code examples and insights to help you implement the right approach for your project.
Table of Contents
- Why Convert HTML to PDF?
- Understanding the Challenges
- Solution 1: iText Library
- Solution 2: Flying Saucer with OpenPDF
- Solution 3: wkhtmltopdf with Java wrapper
- Solution 4: CSS2XSLFO with Apache FOP
- Solution 5: JRex (Java API for Gecko)
- Solution 6: PDFKit with JNI
- Performance Comparison
- Implementation Best Practices
- Conclusion
Why Convert HTML to PDF?
Converting HTML to PDF combines the flexibility of web content creation with the reliability of PDF document distribution. PDFs maintain consistent formatting across devices, are printer-friendly, and provide better document security. For Java developers working with web applications, implementing HTML to PDF conversion enables automated report generation, digital document signing, and archiving capabilities.
Understanding the Challenges
HTML to PDF conversion presents several technical challenges:
- Rendering consistency: Ensuring the PDF looks identical to the HTML page
- Table layouts: Maintaining proper table structure and preventing content overflow
- CSS support: Handling various CSS properties, especially complex layouts
- Performance: Processing large documents efficiently
- Font embedding: Managing fonts correctly in the resulting PDF
Let’s explore how different Java solutions address these challenges.
Solution 1: iText Library
iText is a widely-used Java PDF library that offers HTML to PDF conversion capabilities. While newer versions require a commercial license for production use, it remains one of the most robust options.
import com.itextpdf.text.Document;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;
public class ITextHtmlToPdf {
public static void convertHtmlToPdf(String html, OutputStream outputStream) throws Exception {
Document document = new Document(PageSize.A4);
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new StringReader(html));
document.close();
}
public static void main(String[] args) {
try {
String html = "<html><body><h1>Test Document</h1>" +
"<table><tr><td>Cell 1</td><td>Cell 2</td></tr></table></body></html>";
FileOutputStream output = new FileOutputStream("output.pdf");
convertHtmlToPdf(html, output);
output.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Pros:
- Mature library with extensive documentation
- Good table handling capabilities
- Extensive PDF manipulation features
- Well-maintained with regular updates
Cons:
- Limited CSS support in the free version
- Commercial license required for newer versions
- Learning curve for advanced features
Solution 2: Flying Saucer with OpenPDF
Flying Saucer (also known as XHTML renderer) combined with OpenPDF offers excellent rendering quality, especially for document-centric HTML content.
import org.xhtmlrenderer.pdf.ITextRenderer;
public class FlyingSaucerExample {
public static void convertToPdf(String html, OutputStream outputStream) throws Exception {
ITextRenderer renderer = new ITextRenderer();
renderer.setDocumentFromString(html);
renderer.layout();
renderer.createPDF(outputStream);
}
public static void main(String[] args) {
try {
String html = "<html><head><style>table { border-collapse: collapse; }</style></head>" +
"<body><h1>Test Document</h1>" +
"<table border='1'><tr><td>Cell 1</td><td>Cell 2</td></tr></table></body></html>";
FileOutputStream output = new FileOutputStream("flying-saucer-output.pdf");
convertToPdf(html, output);
output.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Pros:
- Better CSS support than basic iText
- Open-source and free for commercial use
- Good rendering quality for document-like content
- Supports CSS2.1 spec (mostly)
Cons:
- Limited support for modern CSS features
- No JavaScript support
- May have issues with complex layouts
Solution 3: wkhtmltopdf with Java wrapper
wkhtmltopdf is a command-line tool that uses the WebKit rendering engine. By creating a Java wrapper around this tool, you can leverage its excellent rendering capabilities.
public class WkHtmlToPdfWrapper {
public static void convertHtmlToPdf(String html, String outputPath) throws Exception {
// Save HTML to temp file
File htmlFile = File.createTempFile("input", ".html");
try (FileWriter writer = new FileWriter(htmlFile)) {
writer.write(html);
}
// Build process
ProcessBuilder processBuilder = new ProcessBuilder(
"wkhtmltopdf",
htmlFile.getAbsolutePath(),
outputPath
);
// Execute and wait
Process process = processBuilder.start();
int exitCode = process.waitFor();
// Clean up
htmlFile.delete();
if (exitCode != 0) {
throw new RuntimeException("wkhtmltopdf process failed with exit code: " + exitCode);
}
}
public static void main(String[] args) {
try {
String html = "<html><body><h1>WebKit Test</h1><table><tr><td>Cell 1</td></tr></table></body></html>";
convertHtmlToPdf(html, "wkhtmltopdf-output.pdf");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Pros:
- WebKit rendering engine provides excellent HTML/CSS support
- Handles complex layouts well
- Good table support
- JavaScript execution capability
Cons:
- Requires external binary installation
- Potential security concerns with executing external processes
- Memory-intensive for large documents
Solution 4: CSS2XSLFO with Apache FOP
This approach converts HTML/CSS to XSL-FO format, then uses Apache FOP to generate the PDF. While mentioned in the question as producing suboptimal results with tables, it can be optimized with proper configuration.
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.apache.fop.apps.FopFactory;
import org.apache.fop.apps.Fop;
import org.apache.fop.apps.MimeConstants;
public class CSS2XSLFOExample {
public static void convertToPdf(String htmlFile, String outputPdf) throws Exception {
// Step 1: Convert HTML/CSS to XSL-FO using XSLT
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(
new StreamSource(new File("html2fo.xsl")));
File foFile = File.createTempFile("output", ".fo");
transformer.transform(
new StreamSource(new StringReader(htmlFile)),
new StreamResult(foFile));
// Step 2: Convert XSL-FO to PDF using Apache FOP
FopFactory fopFactory = FopFactory.newInstance(new File(".").toURI());
OutputStream out = new BufferedOutputStream(new FileOutputStream(outputPdf));
try {
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, out);
transformer = factory.newTransformer();
transformer.transform(new StreamSource(foFile), new SAXResult(fop.getDefaultHandler()));
} finally {
out.close();
foFile.delete();
}
}
}
Pros:
- Complete separation of content and formatting
- Highly customizable output
- Strong typographic control
Cons:
- Complex implementation
- Table layout issues (as noted in the question)
- Steep learning curve for XSL-FO
Solution 5: JRex (Java API for Gecko)
JRex provides a Java wrapper around Mozilla’s Gecko rendering engine, offering another approach for converting HTML to PDF.
// Note: JRex implementation example - conceptual code
import org.mozilla.jrex.JRex;
public class JRexExample {
public static void convertToPdf(String html, String outputPath) {
JRex jrex = new JRex(); // Initialize the Gecko engine
jrex.render(html); // Render the HTML content
jrex.saveAsPdf(outputPath); // Save as PDF
jrex.close(); // Clean up resources
}
}
Pros:
- Good rendering quality with Gecko engine
- Supports modern web standards
- JavaScript execution capability
Cons:
- Complex setup and dependencies
- Less mainstream than other solutions
- Limited documentation and examples
Solution 6: PDFKit with JNI
For developers willing to use JNI (Java Native Interface), PDFKit provides another option for HTML to PDF conversion.
// Conceptual code for PDFKit with JNI
public class PDFKitExample {
static {
System.loadLibrary("pdfkit_jni"); // Load native library
}
private native void convertHtmlToPdfNative(String html, String outputPath);
public void convertHtmlToPdf(String html, String outputPath) {
convertHtmlToPdfNative(html, outputPath);
}
}
Pros:
- Fast rendering performance
- Good CSS support
- Modern browser rendering capability
Cons:
- Platform-specific implementation
- JNI complexity
- Maintenance overhead
Performance Comparison
When evaluating HTML to PDF solutions, consider these performance metrics:
Solution | Rendering Quality | Speed | Memory Usage | CSS Support |
---|---|---|---|---|
iText | Good | Fast | Low | Basic |
Flying Saucer | Very Good | Medium | Medium | Good |
wkhtmltopdf | Excellent | Medium | High | Excellent |
CSS2XSLFO | Fair | Slow | Medium | Limited |
JRex | Good | Medium | High | Very Good |
PDFKit | Very Good | Fast | Medium | Excellent |
Implementation Best Practices
For successful HTML to PDF conversion in Java applications:
- Use semantic HTML: Clean, well-structured HTML converts more reliably
- Simplify CSS: Avoid advanced CSS features that might not be supported
- Test thoroughly: Check rendering across different content types
- Consider performance: For high-volume applications, benchmark different solutions
- Plan for maintenance: Choose libraries with active development communities
Conclusion
Converting HTML to PDF in Java applications offers several viable approaches, each with its own strengths and limitations. For simple document-based conversions with tables, Flying Saucer with OpenPDF provides an excellent balance of quality and ease of implementation. For complex web pages with modern CSS, wkhtmltopdf with a Java wrapper offers superior rendering fidelity.
The best solution depends on your specific requirements, including rendering quality needs, performance constraints, and licensing considerations. Start with a thorough evaluation of your document complexity and rendering requirements before selecting a conversion approach.
Have you implemented HTML to PDF conversion in your Java applications?