본문 바로가기

리눅스

[리눅스] 병렬 압축 - parallel compression

728x90

병렬 압축 - parallel compression

pbzip2 : bzip2의 병렬 구현

https://launchpad.net/pbzip2

: PBZIP2는 pthread를 사용하고 SMP 시스템에서 거의 선형에 가까운 속도 향상을 달성하는 bzip2 블록 정렬 파일 압축기의 병렬 구현입니다. 이 버전의 출력은 bzip2 v1.0.2 이상과 완전히 호환됩니다(즉, pbzip2로 압축된 모든 항목은 bzip2로 압축 해제할 수 있음). PBZIP2는 pthreads 호환 C++ 컴파일러(예: gcc)가 있는 모든 시스템에서 작동해야 합니다.

pbzip2 패키지 설치

yum install -y pbzip2

$ yum install -y pbzip2

10GB 파일 생성

dd if=/dev/urandom of=file_10GB count=1024 bs=10M

$ dd if=/dev/urandom of=file_10GB count=1024 bs=10M
1024+0 records in
1024+0 records out
10737418240 bytes (11 GB) copied, 91.2048 s, 118 MB/s


$ ls -lh file_10GB
-rw-r--r-- 1 root root 10G  3월 24 17:42 file_10GB

tar 명령으로 압축

tar --use-compress-prog=pbzip2 -cf file10g-pbzip2.tar.bz2 file_10GB

$ tar --use-compress-prog=pbzip2 -cf file10g-pbzip2.tar.bz2 file_10GB

CPU 사용 현황

tar 명령으로 압축 풀기

tar --use-compress-prog=pbzip2 -xf file10g-pbzip2.tar.bz2

$ tar --use-compress-prog=pbzip2 -xf file10g-pbzip2.tar.bz2

압축하기

pbzip2 -kv -p2 file1g-pigz
$ pbzip2 -kv -p2 file1g-pigz
Parallel BZIP2 v1.1.12 [Dec 21, 2014]
By: Jeff Gilchrist [http://compression.ca]
Major contributions: Yavor Nikolov [http://javornikolov.wordpress.com]
Uses libbzip2 by Julian Seward

         # CPUs: 2
 BWT Block Size: 900 KB
File Block Size: 900 KB
 Maximum Memory: 100 MB
-------------------------------------------
         File #: 1 of 1
     Input Name: file1g-pigz
    Output Name: file1g-pigz.bz2

     Input Size: 1073741824 bytes
Compressing data...
    Output Size: 1078563441 bytes
-------------------------------------------

     Wall Clock: 70.809066 seconds

압축풀기

pbzip2 -d file1g-pigz.bz2
$ ls -l file1g-pigz
-rw-r--r-- 1 root root 1073741824 Jun 24 12:33 file1g-pigz

압축하기

tar cf file1g-pigz.tar.bz2 --use-compress-prog=pbzip2 file1g-pigz

압축풀기

pbzip2 -d file1g-pigz.tar.bz2
$ ls -l file1g-pigz.tar 
-rw-r--r-- 1 root root 1073745920 Jun 24 12:46 file1g-pigz.tar

pbzip2 help page

$ pbzip2 --help
Parallel BZIP2 v1.1.12 [Dec 21, 2014]
By: Jeff Gilchrist [http://compression.ca]
Major contributions: Yavor Nikolov [http://javornikolov.wordpress.com]
Uses libbzip2 by Julian Seward

Usage: pbzip2 [-1 .. -9] [-b#cdfhklm#p#qrS#tVz] <filename> <filename2> <filenameN>
 -1 .. -9        set BWT block size to 100k .. 900k (default 900k)
 -b#             Block size in 100k steps (default 9 = 900k)
 -c,--stdout     Output to standard out (stdout)
 -d,--decompress Decompress file
 -f,--force      Overwrite existing output file
 -h,--help       Print this help message
 -k,--keep       Keep input file, don't delete
 -l,--loadavg    Load average determines max number processors to use
 -m#             Maximum memory usage in 1MB steps (default 100 = 100MB)
 -p#             Number of processors to use (default: autodetect [8])
 -q,--quiet      Quiet mode (default)
 -r,--read       Read entire input file into RAM and split between processors
 -S#             Child thread stack size in 1KB steps (default stack size if unspecified)
 -t,--test       Test compressed file integrity
 -v,--verbose    Verbose mode
 -V,--version    Display version info for pbzip2 then exit
 -z,--compress   Compress file (default)
 --ignore-trailing-garbage=# Ignore trailing garbage flag (1 - ignored; 0 - forbidden)

If no file names are given, pbzip2 compresses or decompresses from standard input to standard output.

Example: pbzip2 -b15vk myfile.tar
Example: pbzip2 -p4 -r -5 myfile.tar second*.txt
Example: tar cf myfile.tar.bz2 --use-compress-prog=pbzip2 dir_to_compress/
Example: pbzip2 -d -m500 myfile.tar.bz2
Example: pbzip2 -dc myfile.tar.bz2 | tar x
Example: pbzip2 -c < myfile.txt > myfile.txt.bz2

pigz : gzip의 병렬 구현

https://zlib.net/pigz/

: 최신 멀티 프로세서, 멀티 코어 시스템을 위한 gzip의 병렬 구현

pigz 패키지 설치

yum install -y pigz

$ yum install -y pigz

5GB 파일 생성

dd if=/dev/urandom of=file_5GB count=512 bs=10M

$ dd if=/dev/urandom of=file_5GB count=512 bs=10M
512+0 records in
512+0 records out
5368709120 bytes (5.4 GB) copied, 44.4681 s, 121 MB/s

$ ls -lh file_5GB
-rw-r--r-- 1 root root 5.0G  3월 24 18:23 file_5GB

tar 명령으로 압축

tar --use-compress-prog=pigz -cf file_5GB-pigz.tar.gz file_5GB

$ tar --use-compress-prog=pigz -xf file_5GB-pigz.tar.gz file_5GB-pigz

CPU 사용 현황

tar 명령으로 압축 풀기

tar --use-compress-prog=pigz -xf file10g-pigz.tar.gz

$ tar --use-compress-prog=pigz -xf file10g-pigz.tar.gz

pigz help page

$ pigz --help
Usage: pigz [options] [files ...]
  will compress files in place, adding the suffix '.gz'.  If no files are
  specified, stdin will be compressed to stdout.  pigz does what gzip does,
  but spreads the work over multiple processors and cores when compressing.

Options:
  -0 to -9, -11        Compression level (level 11, zopfli, is much slower)
  --fast, --best       Compression levels 1 and 9 respectively
  -b, --blocksize mmm  Set compression block size to mmmK (default 128K)
  -c, --stdout         Write all processed output to stdout (won't delete)
  -d, --decompress     Decompress the compressed input
  -f, --force          Force overwrite, compress .gz, links, and to terminal
  -F  --first          Do iterations first, before block split for -11
  -h, --help           Display a help screen and quit
  -i, --independent    Compress blocks independently for damage recovery
  -I, --iterations n   Number of iterations for -11 optimization
  -k, --keep           Do not delete original file after processing
  -K, --zip            Compress to PKWare zip (.zip) single entry format
  -l, --list           List the contents of the compressed input
  -L, --license        Display the pigz license and quit
  -M, --maxsplits n    Maximum number of split blocks for -11
  -n, --no-name        Do not store or restore file name in/from header
  -N, --name           Store/restore file name and mod time in/from header
  -O  --oneblock       Do not split into smaller blocks for -11
  -p, --processes n    Allow up to n compression threads (default is the
                       number of online processors, or 8 if unknown)
  -q, --quiet          Print no messages, even on error
  -r, --recursive      Process the contents of all subdirectories
  -R, --rsyncable      Input-determined block locations for rsync
  -S, --suffix .sss    Use suffix .sss instead of .gz (for compression)
  -t, --test           Test the integrity of the compressed input
  -T, --no-time        Do not store or restore mod time in/from header
  -v, --verbose        Provide more verbose output
  -V  --version        Show the version of pigz
  -z, --zlib           Compress to zlib (.zz) instead of gzip format
  --                   All arguments after "--" are treated as files

테스트 파일 생성(1GB)

dd if=/dev/urandom of=file1g-pigz count=1024 bs=1M
$ dd if=/dev/urandom of=file1g-pigz count=1024 bs=1M
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 6.90868 s, 155 MB/s

pigz 압축풀기

pigz -d file10g-pigz.tar.gz
unpigz file10g-pigz.tar.gz
728x90