Fixing corrupted character encoding in TAR format Postgres backup

So I’m restoring a backup of a Postgres database when this old familiar error occurs:

pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence for encoding "UTF8": 0x96

This is a common issue with Postgres which I’ve written about before but this time I can’t edit the data manually because it is in pg_dump -Ft “tar” format.

There are a few guides around the internet offering solutions to the problem but they mostly suggest fixing the encoding in the original database and backing it up again, preferably in SQL format.

However what do you do when the original server is destroyed. I mean that is what backups are for right? I’m kinda panicking at this point because I don’t have access to the original database anymore. But I’ve got backups, I’m good right? – Wrong.

So after some nail-biting, I have found a way to fix this. The procedure is a little complex but not too bad and certainly better than the alternative of losing all your data.

 

Step 1 – Create a working directory

Create a directory inside /tmp. It must be done in /tmp or some other globally-writable directory.

mkdir /tmp/pg
cd /tmp/pg

 

Step 2 – Untar the backup

tar xf ~/mybackup.tar
chmod 644 *

You’ll get a zero-block warning from tar. Don’t worry about it, a bug in pg_dump writes out an incorrect tar format. You also have to set the permissions to make the files globally readable or Postgres won’t see them.

 

Step 3 – Fix up the paths

In the file restore.sql, replace all occurrences of $$PATH$$ with /tmp/pg

sed -i 's/\$\$PATH\$\$/\/tmp\/pg/g' restore.sql

 

Step 4 – Fix all the badly encoded files

Your data is now in plain-text format so you could fix it by hand but here’s a little bash script to do it automatically:

for i in *.dat; do mv $i tmp.dat; iconv -f 8859_1 -t UTF-8 tmp.dat > $i; done

 

Step 5 - Restore the database

The backup is now in a state where it can be restored. I don’t trust pg_restore anymore so I’ll use the client to do it:

psql mydatabase -f restore.sql

 …and there you have it. It worked for me, hopefully it will work for you too!