文章目录

mammoth

官网地址:https://github.com/mwilliamson/mammoth.js#readme

安装mammoth:

npm i mammoth -S

我们可以安装mammoth来实现上传的word文件的在线预览,我们以element的上传组件为示例:

<template>
  <div>
    <el-upload class="upload-demo" drag action="#" :auto-upload="false" :on-change="changeFile">
      <el-icon class="el-icon--upload"><upload-filled /></el-icon>
      <div class="el-upload__text">Drop file here or <em>click to upload</em></div>
      <template #tip>
        <div class="el-upload__tip">jpg/png files with a size less than 500kb</div>
      </template>
    </el-upload>
    <div v-html="wordcontent"></div>
  </div>
</template>
<script lang="ts" setup>
import { ref } from 'vue'
import type { UploadProps } from 'element-plus'
import mammoth from 'mammoth'

const wordcontent = ref('')

const changeFile: UploadProps['onChange'] = (uploadFile, uploadFiles) => {
  const file = uploadFile.raw
  const myReader = new FileReader()
  myReader.readAsArrayBuffer(file as Blob)
  myReader.addEventListener('loadend', function (e) {
    const buffer = e?.target?.result // arraybuffer object
    mammoth
      .convertToHtml({
        arrayBuffer: buffer
      })
      .then(function (result: { value: string; messages: any[] }) {
        const html = result.value // The generated HTML
        const messages = result.messages // Any mesarnings during conversion
        wordcontent.value = html
      })
      .done()
  })
}
</script>

详细的API参考官网地址。

docx4js

官网地址:https://github.com/lalalic/docx4js

使用docx4js实现获取docx文件有多少页码。

安装docx4js:

npm i docx4js -S

我们新建一个docx.mjs,代码如下:

import docx4js from 'docx4js'
import { TextDecoder } from 'util'

docx4js.docx.load('./test2.docx').then((doc) => {
  const propsAppRaw = doc.parts['docProps/app.xml']._data.getContent()
  const propsApp = new TextDecoder('utf-8').decode(propsAppRaw)
  console.log(propsApp)
  const match = propsApp.match(/<Pages>(\d+)<\/Pages>/)
  if (match && match[1]) {
    const count = Number(match[1])
    console.log(count)
  }
})

执行命令: node docx.mjs

可以在控制台看到打印结果,第一个打印:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes"><Template>Normal.dotm</Template><Pages>4</Pages><Words>996</Words><Characters>1013</Characters><Lines>0</Lines><Paragraphs>0</Paragraphs><TotalTime>0</TotalTime><ScaleCrop>false</ScaleCrop><LinksUpToDate>false</LinksUpToDate><CharactersWithSpaces>1250</CharactersWithSpaces><Application>WPS Office_12.1.0.18543_F1E327BC-269C-435d-A152-05C5408002CA</Application><DocSecurity>0</DocSecurity></Properties>

第二个打印:

4

其中的4表示这个word文档是4页。

但是有时候,有的word文档解析的结果是不一样的,如下:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ap:Properties xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes" xmlns:ap="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties"/>

该文件中就没有包含Pages的字段。

所以综上所示,如果通过docx4js获取word文件的页码数量是不准确的,并且需要在node环境中运行,且文件是docx格式的,doc格式的不支持。

点赞(0) 打赏

评论列表 共有 0 条评论

暂无评论

微信公众账号

微信扫一扫加关注

发表
评论
返回
顶部