Office转PDF方案[Java版]

效果最佳的Office转PDF解决方案

找了很久总算是找到一些可用的转换方案,废话不多说,目前靠谱的解决方案大致分为以下几类:

  • 开源组件版: openoffice/libroffice
  • 企业版API版: WPS/Office
  • 系统强依赖版: documents4j + Windows WPS/Windows Office
  • 纯净依赖版: aspose-words

开源组件版

libroffice为例,需要搭建 Office 服务器,开放API端口,以供Java或其他服务调用。

优点: 开源免费
缺点: 存在转换前后效果不一致的情况、服务稳定性不佳

安装步骤

以Linux服务器为例,以下是环境搭建步骤:

  1. 请去官网下载安装包libreoffice
  2. 将下载的安装包(Apache_OpenOffice_4.1.14_Linux_x86-64_install-rpm_zh-CN.tar.gz)解压缩安装文件到/tmp/OpenOffice
  3. 切换至/tmp/OpenOffice/zh-CN/RPMS,执行yum localinstall *.rpm
  4. 装完后会在当前目录下生成一个desktop-integration目录。切换至desktop-integration,执行yum localinstall openoffice4.1.14-redhat-menus-4.1.14-9811.noarch.rpm
  5. 切换至/opt/openoffice4/program/,为防止出现OpenOffice启动时报错,所以先执行安装yum install libXext.x86_64 & yum groupinstall "X Window System"
  6. 启动nohup /opt/openoffice4/program/soffice -headless -accept="socket,host={{IP}},port={{Port}};urp;" -nofirststartwizard &

OpenOfficelibroffice 的环境搭建步骤基本一致,仅仅只是文件名称不一致而已,但大致位置全部相同

企业版API版

WPS为例,需要注册和认证企业账号,通过HTTP请求的方式调用接口转换。详情请了解WPS开放平台

优点: 效果好
缺点: 闭源、按转换次数收费、数据外泄

系统强依赖版

documents4j + Windows WPS为例,可通过进程通信的方式进行转换

优点: 效果好、数据不会外协
缺点: 需要Windows系统环境、企业使用需要使用许可

搭建步骤

  1. Windows中安装WPSJRE环境
  2. 编写Java代码,运行服务

代码示例

pom.xml

<dependencies>
    <dependency>
        <groupId>com.documents4j</groupId>
        <artifactId>documents4j-local</artifactId>
        <version>1.1.12</version>
    </dependency>
    <dependency>
        <groupId>com.documents4j</groupId>
        <artifactId>documents4j-transformer-msoffice-word</artifactId>
        <version>1.1.12</version>
    </dependency>
</dependencies>
<build>
    <plugins>
        <plugin>
            <!-- 指定项目编译时的java版本和编码方式 -->
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.7.0</version>
            <configuration>
                <target>1.8</target>
                <source>1.8</source>
                <encoding>UTF-8</encoding>
            </configuration>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>3.1.0</version>
            <configuration>
                <archive>
                    <manifest>
                        <mainClass>Main</mainClass>                
                    </manifest>
                </archive>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

*.java

//Main.java
public class Main {

    public static void main(String[] args) throws RemoteException {
        IDocumentConvert convert = new DocumentConvertRemote();
        IDocumentConvert skeleton = (IDocumentConvert) UnicastRemoteObject.exportObject(convert, 0);
        Registry registry = LocateRegistry.createRegistry(10099);
        registry.rebind(IDocumentConvert.class.getName(), skeleton);
    }
}
package org.cikaros.convert;

import java.rmi.Remote;
import java.rmi.RemoteException;

public interface IDocumentConvert extends Remote {

    byte[] convert(byte[] data) throws RemoteException;

}
package org.cikaros.convert;

import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;

import java.io.*;
import java.rmi.RemoteException;

public class DocumentConvertRemote implements IDocumentConvert {

    @Override
    public synchronized byte[] convert(byte[] data) throws RemoteException {
        try (ByteArrayInputStream in = new ByteArrayInputStream(data);
             ByteArrayOutputStream out = new ByteArrayOutputStream()) {
            IConverter converter = LocalConverter.builder()
                    .build();
            converter.convert(in).as(DocumentType.DOCX).to(out).as(DocumentType.PDF).execute();
            converter.shutDown();
            return out.toByteArray();
        } catch (IOException e) {
            return new byte[0];
        }
    }
}

在其他项目中使用

...
//域名和端口来自上述服务的部署位置
Registry registry = LocateRegistry.getRegistry(IP, PORT);
IDocumentConvert convert = (IDocumentConvert) registry.lookup(IDocumentConvert.class.getName());
//文件路径请按需修改
try (BufferedInputStream input = new BufferedInputStream(Files.newInputStream(DOCX_FILE_PATH));
     OutputStream output = Files.newOutputStream(PDF_FILE_PATH)) {
    long length = DOCX_FILE_PATH.toFile().length();
    byte[] docx = new byte[(int) length];
    int i = input.read(docx, 0, (int) length);
    byte[] pdf = convert.convert(docx);
    output.write(pdf);
}
...

纯净依赖版

可直接在项目中加入依赖即可。官网地址

优点: 效果好、数据不会外协、跨平台
缺点: 企业使用需要使用许可且收费离谱、试用版会有水印

搭建步骤

  1. 准备JRE环境
  2. 在项目中合适的位置调用API即可

代码示例

pom.xml

    <dependencies>
    <!--        <dependency>-->
    <!--            <groupId>com.aspose</groupId>-->
    <!--            <artifactId>aspose-words</artifactId>-->
    <!--            <version>19.5.0</version>-->
    <!--            <scope>system</scope>-->
    <!--            <systemPath>${basedir}/src/main/resources/aspose-words/aspose-words-19.5jdk.jar</systemPath>-->
    <!--        </dependency>-->
    <dependency>
        <groupId>com.aspose</groupId>
        <artifactId>aspose-words</artifactId>
        <version>20.12</version>
        <scope>system</scope>
        <systemPath>${basedir}/src/main/resources/aspose-words/aspose-words-20.12-jdk17.jar</systemPath>
    </dependency>
</dependencies>

<build>
<plugins>
    <plugin>
        <!-- 指定项目编译时的java版本和编码方式 -->
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.7.0</version>
        <configuration>
            <target>1.8</target>
            <source>1.8</source>
            <encoding>UTF-8</encoding>
        </configuration>
    </plugin>
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>3.1.0</version>
        <configuration>
            <archive>
                <manifest>
                    <mainClass>Main</mainClass> <!-- 指定入口类路径 -->
                </manifest>
            </archive>
            <descriptorRefs>
                <descriptorRef>jar-with-dependencies</descriptorRef> <!-- jar包后缀,生成的jar包形式为:project-1.0-SNAPSHOT-jar-with-dependencies.jar -->
            </descriptorRefs>
        </configuration>
        <!-- 添加此项后,可直接使用mvn package | mvn install -->
        <!-- 不添加此项,需直接使用mvn package assembly:single -->
        <executions>
            <execution>
                <id>make-assembly</id>
                <phase>package</phase>
                <goals>
                    <goal>single</goal>
                </goals>
            </execution>
        </executions>
    </plugin>
</plugins>
</build>

*.java

import com.aspose.words.Document;
import com.aspose.words.PdfSaveOptions;

import java.io.FileOutputStream;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Paths;

public class Main {

    public static void main(String[] args) {
        String docx = "...";
        String pdf = "...";
        try (
                InputStream input = Files.newInputStream(Paths.get(docx));
                FileOutputStream output = new FileOutputStream(pdf);
        ) {
            Document wordDoc = new Document(input);
            PdfSaveOptions pso = new PdfSaveOptions();
            wordDoc.save(output, pso);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}

这里有惊喜!传送门


Office转PDF方案[Java版]
https://blog.cikaros.top/doc/4bd75a2e.html
作者
Cikaros
发布于
2024年4月24日
许可协议