본문 바로가기

리눅스

병렬 압축 - parallel compression

728x90

병렬 압축 - parallel compression

pbzip2 : bzip2의 병렬 구현

https://launchpad.net/pbzip2

: PBZIP2는 pthread를 사용하고 SMP 시스템에서 거의 선형에 가까운 속도 향상을 달성하는 bzip2 블록 정렬 파일 압축기의 병렬 구현입니다. 이 버전의 출력은 bzip2 v1.0.2 이상과 완전히 호환됩니다(즉, pbzip2로 압축된 모든 항목은 bzip2로 압축 해제할 수 있음). PBZIP2는 pthreads 호환 C++ 컴파일러(예: gcc)가 있는 모든 시스템에서 작동해야 합니다.

 

  • pbzip2 패키지 설치
yum install -y pbzip2
  • 10GB 파일 생성
dd if=/dev/urandom of=file_10GB count=1024 bs=10M
$ dd if=/dev/urandom of=file_10GB count=1024 bs=10M
1024+0 records in
1024+0 records out
10737418240 bytes (11 GB) copied, 91.2048 s, 118 MB/s
$ ls -lh file_10GB
-rw-r--r-- 1 root root 10G  3월 24 17:42 file_10GB
  • tar 명령으로 압축
tar --use-compress-prog=pbzip2 -cf file10g-pbzip2.tar.bz2 file_10GB
  • CPU 사용 현황

t1

  • tar 명령으로 압축 풀기
tar --use-compress-prog=pbzip2 -xf file10g-pbzip2.tar.bz2
  • pbzip2 help page
$ pbzip2 --help
Parallel BZIP2 v1.1.12 [Dec 21, 2014]
By: Jeff Gilchrist [http://compression.ca]
Major contributions: Yavor Nikolov [http://javornikolov.wordpress.com]
Uses libbzip2 by Julian Seward

Usage: pbzip2 [-1 .. -9] [-b#cdfhklm#p#qrS#tVz] <filename> <filename2> <filenameN>
 -1 .. -9        set BWT block size to 100k .. 900k (default 900k)
 -b#             Block size in 100k steps (default 9 = 900k)
 -c,--stdout     Output to standard out (stdout)
 -d,--decompress Decompress file
 -f,--force      Overwrite existing output file
 -h,--help       Print this help message
 -k,--keep       Keep input file, don't delete
 -l,--loadavg    Load average determines max number processors to use
 -m#             Maximum memory usage in 1MB steps (default 100 = 100MB)
 -p#             Number of processors to use (default: autodetect [8])
 -q,--quiet      Quiet mode (default)
 -r,--read       Read entire input file into RAM and split between processors
 -S#             Child thread stack size in 1KB steps (default stack size if unspecified)
 -t,--test       Test compressed file integrity
 -v,--verbose    Verbose mode
 -V,--version    Display version info for pbzip2 then exit
 -z,--compress   Compress file (default)
 --ignore-trailing-garbage=# Ignore trailing garbage flag (1 - ignored; 0 - forbidden)

If no file names are given, pbzip2 compresses or decompresses from standard input to standard output.

Example: pbzip2 -b15vk myfile.tar
Example: pbzip2 -p4 -r -5 myfile.tar second*.txt
Example: tar cf myfile.tar.bz2 --use-compress-prog=pbzip2 dir_to_compress/
Example: pbzip2 -d -m500 myfile.tar.bz2
Example: pbzip2 -dc myfile.tar.bz2 | tar x
Example: pbzip2 -c < myfile.txt > myfile.txt.bz2
728x90

pigz : gzip의 병렬 구현

https://zlib.net/pigz/

: 최신 멀티 프로세서, 멀티 코어 시스템을 위한 gzip의 병렬 구현

  • pigz 패키지 설치
yum install -y pigz
  • 5GB 파일 생성
dd if=/dev/urandom of=file_5GB count=512 bs=10M
$ dd if=/dev/urandom of=file_5GB count=512 bs=10M
512+0 records in
512+0 records out
5368709120 bytes (5.4 GB) copied, 44.4681 s, 121 MB/s
$ ls -lh file_5GB
-rw-r--r-- 1 root root 5.0G  3월 24 18:23 file_5GB
  • tar 명령으로 압축
tar --use-compress-prog=pigz -cf file_5GB-pigz.tar.gz file_5GB
  • CPU 사용 현황

t2

  • tar 명령으로 압축 풀기
tar --use-compress-prog=pigz -xf file10g-pigz.tar.gz
  • pigz help page
$ pigz --help
Usage: pigz [options] [files ...]
  will compress files in place, adding the suffix '.gz'.  If no files are
  specified, stdin will be compressed to stdout.  pigz does what gzip does,
  but spreads the work over multiple processors and cores when compressing.

Options:
  -0 to -9, -11        Compression level (level 11, zopfli, is much slower)
  --fast, --best       Compression levels 1 and 9 respectively
  -b, --blocksize mmm  Set compression block size to mmmK (default 128K)
  -c, --stdout         Write all processed output to stdout (won't delete)
  -d, --decompress     Decompress the compressed input
  -f, --force          Force overwrite, compress .gz, links, and to terminal
  -F  --first          Do iterations first, before block split for -11
  -h, --help           Display a help screen and quit
  -i, --independent    Compress blocks independently for damage recovery
  -I, --iterations n   Number of iterations for -11 optimization
  -k, --keep           Do not delete original file after processing
  -K, --zip            Compress to PKWare zip (.zip) single entry format
  -l, --list           List the contents of the compressed input
  -L, --license        Display the pigz license and quit
  -M, --maxsplits n    Maximum number of split blocks for -11
  -n, --no-name        Do not store or restore file name in/from header
  -N, --name           Store/restore file name and mod time in/from header
  -O  --oneblock       Do not split into smaller blocks for -11
  -p, --processes n    Allow up to n compression threads (default is the
                       number of online processors, or 8 if unknown)
  -q, --quiet          Print no messages, even on error
  -r, --recursive      Process the contents of all subdirectories
  -R, --rsyncable      Input-determined block locations for rsync
  -S, --suffix .sss    Use suffix .sss instead of .gz (for compression)
  -t, --test           Test the integrity of the compressed input
  -T, --no-time        Do not store or restore mod time in/from header
  -v, --verbose        Provide more verbose output
  -V  --version        Show the version of pigz
  -z, --zlib           Compress to zlib (.zz) instead of gzip format
  --                   All arguments after "--" are treated as files

 

728x90