CPS616 Compression Technology CPS600 Module on Compression March 4,1995 Roman Markowski NPAC Syracuse University 111 College Place Syracuse NY 13244-4100 Abstract of CPS600 Compression Module This set of foils describes image and video compression schemes concentrating on Wavelets which seem most powerful although MPEG using related but less efficient Fourier technology will be used much more widely initially Wavelets are described in detail for Image case where they aree discussed for Telemedicine application. JPEG JBIG Fractal H.261 Schemes are briefly reviewed Compressing Still and Moving Images image 2000x2000x24 bpp = 12MB one second of NTSC-quality video requires 23 MB compression - eliminating of redundant or less critical information Spatial redundancy: values of neighboring pixels are strongly correlated Spectral redundancy: the spectral values for the same pixel location are correlated temporal redundancy: frames show very little change in the sequence decreases the time and cost of transmission and the requirements for storage Image/Video Compression lossless - removes the redundancy in the signal; ratio 3:1; the heights of every pixel are perfectly reproduced lossy - selectively discards "less important" information; ratio 100:1 controversy: the critical feature of any lossy compression is what is important and what is not. evaluation of several image compression technologies JPEG - the leading standard in visual compression (compresses the image block by block), lossy, full color, block by block JBIG - binary images, lossless, gray scale Fractal - compression slow, decompression fast, lossy, 1000:1, bad quality PhotoCD - Kodak, 96x64, 192x128, 384x256, 768x512, 1536x1024,3072x2048 Wavelet - discovered in 1987, lossy, ratio 100:1, compresses the image as a whole Huffman and Other Compression Techniques Huffman encoding; Shannon-Fano encoding; arithmetic encoding input is sliced into fixed units while the corresponding output comes in chunks of variable size RLE -Run Length Encoding (TIFF, BMP) lossless: Replace upto 127 identical characters by two bytes -- first byte is number of identical characters in string, second byte is character itself For example: AAAAbbbbbccct ---> 4A5b3c1t LZW - Lepel-Ziv-Welch (TIFF, GIF) Directory based encoding algorithm compress, zoo, lha, pkzip. arj LZ77, LZ78 input is divided into units of variable length (words) and coded in a fixed-length output code shorter bit patterns for most common characters DCT - Discrete Cosine Transform (MPEG, JPEG, H.261) lossy; converts the spatial image representation into a frequency map DCT scheme is effective only for compressing continuous-tone images (from "Graphics File Formats", O'Reilly and Associates Inc., page 162 in description of JPEG algorithm) JPEG - Joint Photographic Experts Group current international standard for image compression lossy algorithm: the threshold of visible difference 20:1 lossless compression mode: 2:1 ratio, 12bpp (bpp is bits per pixel) designed for compression of full-color (24 bit) or gray-scale digitized images of "natural" scenes (continuous tone) exploits the known limitations of the human eye (10,000 colors simultaneously) ratio can be varied DCT (Discrete Cosine Transformation) and Huffman coding source available (Independent JPEG Group) ftp://ftp.uu.net/graphics/jpeg/wallace.ps.Z JPEG Algorithm Specification -- I First transfer the image into a suitable color space (RGB --> Image representations which separate luminance and chrominance YUV, YCbCr, etc.); the human eye is not as sensitive to high-frequency color as it is to high-frequency luminance YUV is used in European TV standards and corresponds to YIQ in American NTSC. Y specifies gray-scale or luminance; U and V the chrominance YCbCr is another color space where again Y is component of luminancy but Cb and Cr are respectively ther color components in blue(Cb) and red(Cr) the luminance component is left at full resolution; color is usually reduced 2:1 (2v1h,2v2h) This means chrominance reduced 2:1 vertically wiuth no horizontal reduction(2v1h) or reduced both horizontally and vertically (2v2h) JPEG Algorithm Specification -- II Then group the pixels for each component into 8x8 blocks; transform each block through DCT Note for later -- need blocks as natural frequencies varies over image and not constant on a line of pixels as would correspond to Fourier transforms over full x or y of image. Wavelets "naturally" chooses block size. JPEG has fixed blocks In each block, divide each of the 64 frequency components by a separate "quantization coefficient" and round to integers; fundamental lossy step Encode the reduced coefficients using Huffman or arithmetic coding (license) Add headers and output the result JBIG - Joint-bi-level Image Experts Group Losslessly compresses binary images (one bit/pixel) Effective for bi-level images (black and white) Extended to gray scales with upto 8 bits per pixel with good results upto 6 bits per pixel Based on IBM's Q-coder - patented technology (no source) Fractal Compression -- I lossy algorithm patented technology by Barnsley(no source) compression extremely slow (many hours) decompression fast theoretically very high ratio - 1000:1 based on Iterated Function Theory and Partitioned Iterated Function Theory for compression ratio up to 40:1 JPEG is better; quality worse than wavelets or JPEG details are generated when zooming in the advanced form of interpolation Origins of Fractal Compression Birth of fractal geometry in paper by B.Mandelbrot "the Fractal Geometry of Nature", 1977 J. Hutchinson: Iterated Function Theory, 1981 M.Barnsley, "Fractals Everywhere", 1988 in the forward direction fractal mathematics is good for generating natural looking images (trees, clouds, mountains) Used in Computer Graphics (Fractal trees,Mountains etc.) in reverse direction can be used to compress images inverse problem: to go from a given image to Iterated Function System that can generate the original (unsolved) there are not many fractal compression programs available the fractals that lurk within fractal image compression are not those of the complex plane (Mandelbrot, Julia), but of Iterated Function Theory example: Sierpinski's Triangle MPEG - Moving Picture Experts Group standard for compression (synchronized) audio and video 4 parts: video encoding, audio encoding, "systems" (synchronization) and compliance testing defines 352x240 pixels (30 fps in USA, 25fps in Europe) MPEG-1- designed for bandwidth up to 1.5 Mbps MPEG-2 - higher quality; designed for bandwidth 4-10 Mbps MPEG-3 - does not longer exist; MPEG-4 - very low bitrate coding DCT (Discrete Cosine Transform ) done on 8x8 blocks lossy algorithm with compression in space (DCT) and time (frame to frame) 3 types of frames I (Intra) frames - sent every 10-12 frames as still images P (Predicted) frames - deltas from the most recent I or P frame B (Bidirectional) frames - interpolation between I and P frames H.261 - similar to but not compatible with MPEG CCITT standard VideoCodec for Audiovisual Services at px64 Kbps (p=1..30) describes videosource coder, video multiplex coder and the transmission coder designed for ISDN defines two picture formats: CIF (Common Intermedia Format) - 288x360 luminance, 144x180 chrominance QCIF (Quarter CIF) - 144x180 of luminance and 72x90 of chrominance Performance Measures compression ratio is PSNR = Peak-signal-to-noise ratio (in dB) RMSE is Root Mean Standard Error Introduction to Wavelets (1) new technology signal analysis - weighted sum of basis functions Infinitely many possible sets of wavelets coefficients contain information about the signal basis functions impulse function reveals information only about the time domain behavior of the signal Fourier representation reveals information about signal's frequency domain behavior we want to have representation which contains info about both the time and frequency (frequency content of the signal at the particular instant of time) Heisenberg inequality - resolution in time and in frequency cannot both be made arbitrarily small Introduction to Wavelets (2) Image: low frequency events are spread out in time and high frequency events are localized in time Further structure of Image is not uniform over x or y direction of image but rather in blocks where say head or a particular detail is. This is why JPEG uses FFT's in blocks. Wavelets automatically build in block structure adaptively so get more detail where you need it. sines or cosines (FFT) - cannot provide information about the time behavior of signal because they have infinite support impulse function - frequency behavior is not described because of its infinitesimally small support wavelets - set of basis functions, each with finite support of a different width wide variety of wavelet-based image compression schemes reported in the literature Discrete Wavelet Transform FFT vs DWT - Individual wavelet localized pretty well in both time (space) and frequency Fourier expansion functions localized very well in frequency but not all in space or time Both fast, linear operations; invertible and orthogonal rotation in function space From Impulse (delta function) space to that of coefficients of Fourier or Wavelet functions Wavelet Transformation on N points performed as log2N rotations in multiresolution fashion wavelet - localized in space and frequency; Basis consists of scalings and translations of Mother functions Structure of Wavelet analysis Representation of general functions in terms of simpler, fixed building blocks Wavelets = small waves continuous wavelet transform: translation (b) and dilation (a) parameters vary continuously or Discrete wavelet transform Multiresolution analysis Wavelet Transform Characteristics decorrelates the pixel values of the image and concentrates the image info into a relatively small number of coefficients can be implemented as a pair of Quadrature Mirror Filters (QMF) which splits a signal's bandwidth in half lowpass or smoothing filter (H) highpass filter (G) which expresses detail Daubechie's Mother wavelets Mathematical Structure of Discrete Wavelet Transform-I Transform can be written for one of log2N steps as a matrix operation given on following foil. As we will see each step deals with half of data from previous step and resolution examined doubles at each step In this way we look at ALL resolutions i.e. all length scales Each smoothing is constructed in this particular wavelet formulation as an average over 4 points The matrix is orthogonal if: The matrix will zero first two moments on sets of four points if: Matrix Structure of a Simple Wavelet Transformation This is basic matrix for 1D Wavelet transform where points are smoothed in groups of four Mathematical Structure of Discrete Wavelet Transform-II The solution of these 4 equations is: The Smoothing Filter H is the average: The detail is contained in operator G: First step produces N/2 smooth and N/2 detail values. Then we take N/2 smooth values and repeat step getting N/4 smooth and N/4 further detail results at double spatial resolution This recursively leads to pyramid algorithm Pyramid Algorithm in one dimension How Image wavelet compression works We have reviewed 1D Wavelet Transform There are Three steps in 2D Image Case transformation of the image data using a predefined set of basis functions (multiresolution and orthogonal) quantization of the basis function coefficients (lossy) coding of the resulting data set to remove redundancy (Huffman lossless encoding) How wavelet compression works Pictorially 2D Forward/inverse wavelet transform Best methods use direct 2D methods starting with 2 by 2 or similar pixel blocks. However one can compose in several ways 1D transforms. It is NOT best to first transform in x and then transform in y. Rather ... A nice method uses two separate but INTERMIXED 1D transforms Image filtered along the x dimension Downsample by 2 along x Filter along the y dimension Downsample by 2 along y Recursively transform the average signal (depending on required ratio) We have matrix of coefficients (average signal and detail signals of each scale); no compression has been accomplished yet; in fact there has been an increase in amount of storage required 2D Forward wavelet transform 2D Inverse wavelet transform Wavelets -- Quantization (1) compression is achieved by quantizing and encoding wavelet coefficients coefficients that corresponds to smooth parts of data become small and can be set to 0 Wavelets -- Quantization (2) we can eliminate (set to zero) these coefficients with small magnitudes without creating significant distortion in the reconstructed image energy invariance total amount of energy in the image does not change when the wavelet transform is applied any change to wavelet coefficients will result in proportional changes in the pixel values of the reconstructed image we can eliminate all but a few percent of wavelet coefficients thresholding function - amount of compression controlled by "t" parameter Wavelets -- Quantization (3) Quantizing of non-zero wavelet coefficients many-to-one staircase functions decision points {Di, i=0,...,n} reconstruction levels {Ri, i=0,...,n} input value 'x' is mapped to a reconstruction level 'Ri' if 'x' lies in the interval (Di, Di+1] separate quantizer for each scale Wavelets -- Coding codec = encoder/decoder losslessly compressing and decompressing the sparse matrix of quantized coefficients wide variety of schemes compromise between: memory, execution speed, available bandwidth, reconstructed image quality example: Shapiro's Zero Tree encoding Wavelets in Telemedicine Massachusetts General Hospital: no clinically significant image degradation was identified in radiology images up to 30:1 wavelet-based compression technology is superior to all other compression technologies (keeps details, high compression ratio) from a signal processing standpoint, one may view the image as a signal that has high-frequency (high-spatial detail) and low-frequency (smooth) components.The algorithm filters the signal and then iterates the process. Wavelets -existing software HCOMPRESS - astronomical images, Richard L.White, Space Telescope Science Institute (gray scale) EPIC - Efficient Pyramid Image Coder, Eero P.Simoncelli, MIT Media Library IMAGER LIBRARY - available on the Web via the Wavelet Digest home page Commercially available software for medical applications; Aware, Inc - AccuPress Library Comparison W6+VLC, Biorthogonal+VLC, JPEG image coders Wavelets -- Video compression exploiting the temporal redundancies present in an image sequence (as done by MPEG) techniques: hierarchical motion compression, 3D subband coding very time consuming - 256x256x8bpp image on 66MHz 80486 - 0.25 sec inverse wavelet transform of a single full frame (4 frames per second) it is necessary to perform the complete inverse transform for each frame (in a slowly varying image sequence) 16 frames per second if just transform changes in image -- fewer nonzero pixels and so faster Block diagram of the video encoder Block diagram of the video decoder Wavelet references wavelet@math.scarolina.edu http://www.mathsoft.com/wavelets.html "Numerical recipes", WH Press, SA Teukolsky, WT Veterling, BP Flannery "Ten Lectures on Wavelets", Ingrid Daubechies, 1992