docx2txt
Convert Microsoft OOXML files to plain text
Install
- All systems
-
curl cmd.cat/docx2txt.sh
- Debian
-
apt-get install docx2txt
- Ubuntu
-
apt-get install docx2txt
- Arch Linux
-
pacman -S docx2txt
- Kali Linux
-
apt-get install docx2txt
- Windows (WSL2)
-
sudo apt-get update
sudo apt-get install docx2txt
- OS X
-
brew install docx2txt
- Raspbian
-
apt-get install docx2txt
- Dockerfile
- dockerfile.run/docx2txt
docx2txt
Convert Microsoft OOXML files to plain text
docx2txt is a tool that attempts to generate equivalent (ASCII) text files from Microsoft .docx documents, preserving some formatting and document information (which MS text conversion drops) along with appropriate character conversions for a good (ASCII) text experience. It is a platform independent solution consisting of (core) Perl and (wrapper) Unix/Windows shell scripts and a configuration file to control the output text appearance to fair extent. It can very conveniently be used to build a Web based docx document conversion service. Some Makefiles and Windows batch files are provided for easy installation of the scripts. With unzippers like CakeCmd that can deal with corrupt Zip archives, this tool can extract text from corrupt docx documents in many cases, where MS word processor fails to even open them.