Discussion:
Pipe to tar or other compressor
Peter
2011-10-16 08:50:57 UTC
Permalink
How do you pipe to something that will compress well?

Ive tried this:

mysqldump {opts} | 7za a -t7z -an -si -so > outfile.7z

doesnt work becasue 7z needs to seek

mysqldump {opts} | 7za a -txz -an -si -so > outfile.7z

doesnt work becasue 7za doesnt recognise xz

mysqldump {opts} | tar -cf --xz - > outfile.7z

ditto

mysqldump {opts} | tar -cf --lzma - > outfile.7z

doesnt work because tar expects a directory

two steps works for the last method , but this is linux it must be doable in one step?

Peter


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Hadley Rich
2011-10-16 08:46:29 UTC
Permalink
Post by Peter
How do you pipe to something that will compress well?
[...]
Post by Peter
mysqldump {opts} | tar -cf --lzma - > outfile.7z
doesnt work because tar expects a directory
mysqldump {opts} | gzip > foo.sql.gz

hads
--
http://nicegear.co.nz
Open Source Hardware Supplier.

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Peter
2011-10-16 09:01:20 UTC
Permalink
Yes well that was my original solution but the archive is now over the 20MB email limit, and 7z /lzma reduces it to 5MB (not bad for a 100MB dump), hence my reason to avoid gzip.

Cheers

Peter
Post by Hadley Rich
Post by Peter
How do you pipe to something that will compress well?
[...]
Post by Peter
mysqldump {opts} | tar -cf --lzma - > outfile.7z
doesnt work because tar expects a directory
mysqldump {opts} | gzip > foo.sql.gz
hads
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Atom Smasher
2011-10-16 09:31:04 UTC
Permalink
Post by Peter
mysqldump {opts} | tar -cf --lzma - > outfile.7z
doesnt work because tar expects a directory
two steps works for the last method , but this is linux it must be doable in one step?
=============

tar is a "tar archiver". it supports compression, but the result is no
different that compressing a tar file.

try piping though bzip2... or create a temp-file, and compress it however
you like....

one step or two... it often depends on how unreadable you want to make the
command. there's nothing wrong with doing it in two steps, if it works.
--
...atom

________________________
http://atom.smasher.org/
762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
-------------------------------------------------

"Our enemies are innovative and resourceful, and so
are we. They never stop thinking about new ways to
harm our country and our people, and neither do we"
-- George "dubya" Bush, 5 Aug 2004


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Daniel Pittman
2011-10-16 16:57:14 UTC
Permalink
Post by Peter
How do you pipe to something that will compress well?
] mysqldump | {xz, bzip2, gzip}
Post by Peter
 mysqldump {opts} | 7za a -t7z -an -si -so > outfile.7z
doesnt work becasue 7z needs to seek
Yah. Also, if you read the documentation, 7za is highly unrecommended
for use on Unix because it doesn't preserve much data. If you needed
to collate multiple files you would better have used tar and xz
format. Since you had only one, just xz would be the best choice. :)

Daniel
--
♲ Made with 100 percent post-consumer electrons

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Daniel Lawson
2011-10-16 18:47:17 UTC
Permalink
Post by Peter
mysqldump {opts} | tar -cf --lzma - > outfile.7z
doesnt work because tar expects a directory
two steps works for the last method , but this is linux it must be doable in one step?
tar --lzma -cf - <(mysq;dump ${opts} ) > outfile.7z


This opens up a named pipe, with the output of mysqldump being piped
through it. tar sees this as a filedescriptor it can open.

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Steve Holdoway
2011-10-16 20:06:38 UTC
Permalink
Post by Daniel Lawson
Post by Peter
mysqldump {opts} | tar -cf --lzma - > outfile.7z
doesnt work because tar expects a directory
two steps works for the last method , but this is linux it must be doable in one step?
tar --lzma -cf - <(mysq;dump ${opts} ) > outfile.7z
This opens up a named pipe, with the output of mysqldump being piped
through it. tar sees this as a filedescriptor it can open.
<slightly irrelevant>
In my experience, performing these two operations concurrently can place
an unacceptably large load on a production server, as well as locking up
the database ( if you're taking a snapshot ) far longer than absolutely
necessary.

The only time I do this as a single operation is if I'm really short on
disk space.

</irrelevance>

Steve
--
Steve Holdoway BSc(Hons) MNZCS <***@greengecko.co.nz>
http://www.greengecko.co.nz
MSN: ***@greengecko.co.nz
Skype: sholdowa
Peter
2011-10-16 22:05:47 UTC
Permalink
Thankyou all, as usual one learns alot along the way.

FTR the solution adopted is:

/usr/bin/mysqldump {etc} > /home/tasks/backup/clientname-db-backup
/bin/bzip2 -f /home/tasks/backup/clientname-db-backup

For xz on debian you need squeeze or better, package name xz-utils, although available for lenny from backports.

Peter


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
David L Neil
2011-10-16 22:13:23 UTC
Permalink
Peter,
Post by Peter
/usr/bin/mysqldump {etc} > /home/tasks/backup/clientname-db-backup
/bin/bzip2 -f /home/tasks/backup/clientname-db-backup
=it trades storage space for simplicity and indeed faster recovery from
some failure in step two!

=does this also mean that you have performed side-by-side comparison on
a range of compression algorithms/pgms and settled on bzip2 for
MySQLdump/ASCII files?
(instead of (my-traditional) tar...)
--
Regards,
=dn

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Martin D Kealey
2011-10-16 22:20:25 UTC
Permalink
=does this also mean that you have performed side-by-side comparison on a
range of compression algorithms/pgms and settled on bzip2 for MySQLdump/ASCII
files?
(instead of (my-traditional) tar...)
Tar is *not* a compressor, it's an archiver with NO built-in compression.
Ditto cpio, ditto pax.

The GNU versions have some fancy extra features to invoke a (small)
selection of external compressors; "tar -z ..." is functionally identical to
"tar ... | gzip", and "tar -j ..." is to "tar ... | bzip2", except of course
that they automatically turn the pipelines around for extraction.

-Martin

-- The early bird may get the worm, but it's the second mouse that gets the cheese.

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Cliff Pratt
2011-10-17 01:38:53 UTC
Permalink
Post by David L Neil
Peter,
Post by Peter
/usr/bin/mysqldump {etc} > /home/tasks/backup/clientname-db-backup
/bin/bzip2 -f /home/tasks/backup/clientname-db-backup
=it trades storage space for simplicity and indeed faster recovery from
some failure in step two!
=does this also mean that you have performed side-by-side comparison on
a range of compression algorithms/pgms and settled on bzip2 for
MySQLdump/ASCII files?
(instead of (my-traditional) tar...)
Does tar compress? tar with -z does, but I thought that it just used
gzip with the -z option.

Cheers,

Cliff

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
David L Neil
2011-10-17 01:53:55 UTC
Permalink
Post by Cliff Pratt
Post by David L Neil
Post by Peter
/usr/bin/mysqldump {etc} > /home/tasks/backup/clientname-db-backup
/bin/bzip2 -f /home/tasks/backup/clientname-db-backup
=it trades storage space for simplicity and indeed faster recovery from
some failure in step two!
=does this also mean that you have performed side-by-side comparison on
a range of compression algorithms/pgms and settled on bzip2 for
MySQLdump/ASCII files?
(instead of (my-traditional) tar...)
Does tar compress? tar with -z does, but I thought that it just used
gzip with the -z option.
=apologies:
my description of current practice was slightly misleading and has
unfortunately distracted people from the pertinent question and the OP's
topic and objective (from which I'd like to benefit).

=so to clarify:
my using tar does NOT have the specific objective of compression. It
conveniently creates one handy tar-ball of a set of dump-files. I
probably copied this idea from 'somewhere', eg a book; without thinking
through the next/obvious further possibility, as raised by the OP...

=and now, back to my actual request:
has anyone checked which compression tool might do better than 'the
others' when working on the likes of MySQLdump files?
(and ideally (and diverging from the OP's spec), I'd prefer that it to
work on a set of files not just one at a time)
--
Regards,
=dn

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Jethro Carr
2011-10-17 02:11:59 UTC
Permalink
Post by David L Neil
has anyone checked which compression tool might do better than 'the
others' when working on the likes of MySQLdump files?
It would depend on the type of data contained in the SQL databases - for
example, if the database is primarily records vs binary blobs.....

Unfortunately it's kind of hard to say "compression XYZ will always be
best for this usage", since everyone's needs are different.

As a general rule of thumb, bzip2 is better than gzip for compression,
but worse for speed. (although with modern CPUs, it's less of a concern)
Post by David L Neil
(and ideally (and diverging from the OP's spec), I'd prefer that it to
work on a set of files not just one at a time)
That's more a tar issue than a compressor issue.

regards,
jethro
--
Jethro Carr
www.jethrocarr.com
www.amberdms.com
Michael Field
2011-10-17 02:34:06 UTC
Permalink
Post by David L Neil
has anyone checked which compression tool might do better than 'the
others' when working on the likes of MySQLdump files?
(and ideally (and diverging from the OP's spec), I'd prefer that it to
work on a set of files not just one at a time)
Hi,

In general de-duplication and renormalisation of data at the database layer will make more of a difference then choice of compression tool, but one unappreciated option is exploiting the database's indexing to present data in the most compressible order.

For example, dumping 'people' records in name order, or in day/month/year of birth will put more similar substrings together, and will a list of sales order items dumping in product code order if the product code is a sizable character string.

The closest thing in mysqldump is "--order-by-primary", which is primarily aimed to make importing the data quicker (as you are only ever going to add to the end of the primary index). If you do try it, I would be interested in hearing what difference (if any) it makes - it will probably suck. :-)

Mike


_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Jethro Carr
2011-10-17 02:09:52 UTC
Permalink
Post by Cliff Pratt
Does tar compress? tar with -z does, but I thought that it just used
gzip with the -z option.
Correct, tar itself has no compression, tar is more like a container for
holding files.


You can create compressed tar files with various command line options to
invoke external compressors like gzip (-z) or bzip2 (-j).

You can also do the same by piping the output from tar into the
appropriate compression program, but the command options just makes
things a little easier. :-)


regards,
jethro
--
Jethro Carr
www.jethrocarr.com
www.amberdms.com
Cliff Pratt
2011-10-17 03:17:34 UTC
Permalink
Post by Jethro Carr
Post by Cliff Pratt
Does tar compress? tar with -z does, but I thought that it just used
gzip with the -z option.
Correct, tar itself has no compression, tar is more like a container for
holding files.
You can create compressed tar files with various command line options to
invoke external compressors like gzip (-z) or bzip2 (-j).
You can also do the same by piping the output from tar into the
appropriate compression program, but the command options just makes
things a little easier. :-)
Here's another way - from 'man tar':

-I, --use-compress-program PROG
filter through PROG (must accept -d)

Cheers,

Cliff

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Jim Cheetham
2011-10-16 23:13:39 UTC
Permalink
Post by Peter
How do you pipe to something that will compress well?
Fundamentally, you don't. If you want to achieve 'maximum
compression', you need to have the entire corpus available to examine
before you can start. Stream compressors are a tradeoff of size
against utility and speed. Additionally, different compression
techniques will have different effectiveness levels against different
types of input.

The best way to choose for your data is to test everything :-) The
various Linux magazines occasionally hold comparisons for you ...
http://www.linuxjournal.com/article/8051
http://www.linuxjournal.com/article/9370

-jim

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Volker Kuhlmann
2011-10-18 09:15:21 UTC
Permalink
Post by Peter
How do you pipe to something that will compress well?
mysqldump {opts} | 7za a -t7z -an -si -so > outfile.7z
doesnt work becasue 7z needs to seek
mysqldump {opts} | 7za a -txz -an -si -so > outfile.7z
[...]

You seriously need to distinguish between an "archiver" and a
"compressor". An archiver takes multiple files and stuffs them into a
data stream. Examples: tar, and all the mickeymouse stuff you don't want
to have anything to do with, like zip, arj, 7z. A compressor is a
(conceptually simple) filter that applies a transformation to a data
stream to reduce its size without loosing information (or as the jpeg
case may be, with loosing information).

For your convenience, tar (an archiver!!!) allows you to also pipe the
data stream through a compressor or decompressor.

Examples of compressors are compress (you'll have to be old enough to
have used it), gzip, bzip2, lzma. There is a tradeoff between
compression ratio and CPU load, they both go up together. The newer
programs are high in both. lzma blows your mind with its compression
ratio, unfortunately also with its computational requirements. With many
programs you can influence the tradeoff somewhat, and also the memory
requirements (higher memory use yields better compression).

You choose the program/format as per your requirements. mysqldump
produces truckloads of ASCII data in a single stream, so you don't want
an archiver, only a compressor. All of them work well on ASCII data.
lzma might finish tomorrow, the bonus is that there's no data left to
have to deal with...

Volker
--
Volker Kuhlmann is list0570 with the domain in header.
http://volker.dnsalias.net/ Please do not CC list postings to me.

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Loading...