I investigated some issues that caused LibreOffice version 5.1.6.2 to error out when opening certain docx files created with Microsoft Office.
Here's the error:
File format error found at # SAXParseException: '[word/document.xml line 2]: unknown error', Stream 'word/document.xml', Line 2, Column 928831(row,col).
After a deep debugging session, it turns out this is caused by some values of the relativeHeight attributes in the word/document.xml file of the docx.
I made a script to workaround the relativeHeight issue by setting all relativeHeight attributes to zero which, according to the docx specification, means infinite.
After fixing this, I ran into another problem where LibreOffice would sometimes duplicate the w:themeColor attribute upon saving in docx format, thereby invalidating the XML. That is also checked and fixed by the code below.
I figured other people might find this useful, so here's my script:
#!/bin/sh
# Fix to workaround LibreOffice 5 docx issues
# Copyleft 2017 (c) Tom Van Braeckel <tomvanbraeckel@gmail.com>
#
# This fixes these errors I've been getting:
#
# File format error found at
# SAXParseException: '[word/document.xml line 2]: unknown error', Stream 'word/document.xml', Line 2, Column 928831(row,col).
#
# Problematic LibreOffice version:
# --------------------------------
# Version: 5.1.6.2
# Build ID: 1:5.1.6~rc2-0ubuntu1~xenial1
# CPU Threads: 4; OS Version: Linux 4.11; UI Render: default;
# Locale: en-US (en_US.UTF-8); Calc: group
tofix="$1"
if [ -z "$tofix" ]; then
echo "Usage: $0 <filetofix>"
echo "Example: $0 bla.dockx"
exit 1
fi
cwd=$(pwd)
tofixreal=$(readlink -f "$tofix")
tempdir=$(mktemp -d)
cd "$tempdir"
unzip "$tofixreal"
# Fix relativeHeight issue
sed -i "s/relativeHeight=\"[^\"]\+\"/relativeHeight=\"0\"/g" word/document.xml
# and then after saving in LibreOffice 5.2 docx format, we sometimes need this fix:
sed -i 's/w:themeColor="text1" w:themeColor="text1"/w:themeColor="text1"/g' word/document.xml
zip -r "$tofixreal" *
cd "$cwd"
echo "Done! The file $tofixreal has been cleaned from relativeHeight and themeColor issues."
To use this script, make sure it is executable and do:
./fixdocx.sh filename.dockx