Introduction¶
This is the documentation for tszip, a command line interface and Python API for compressing tskit tree sequence files used by msprime, SLiM, fwdpy11 and tsinfer. Tszip achieves much better compression than is possible using generic compression utilities by building on the zarr and numcodecs packages.
The command line interface follows the design of gzip
closely, so should be immediately familiar. Here we compress a large tree sequence
representing 1000 Genomes chromosome 22 using tszip
and decompress it using
tsunzip
:
$ ls -lh
total 297M
-rw-r--r-- 1 jk jk 297M May 10 14:49 1kg_chr20.trees
$ tszip 1kg_chr20.trees
$ ls -lh
total 46M
-rw-r--r-- 1 jk jk 46M May 10 14:51 1kg_chr20.trees.tsz
$ tsunzip 1kg_chr20.trees.tsz
$ ls -lh
total 297M
-rw-r--r-- 1 jk jk 297M May 10 14:52 1kg_chr20.trees