Home | Alpha Telephone | Domain Names | Web Hosting | Get Traffic | xrEvidence | xrSoccer

United States Patent

Previous       Show 10       Next


United States Patent 5,122,873
Golin June 16, 1992

Method and apparatus for selectively encoding and decoding a digital motion video signal at multiple resolution levels


Abstract

At least one selected image of a digital video signal is encoded at multiple levels of resolution. For each level of resolution, a correction image is formed by subtracting the value of each pixel in a reference image of that resolution level from the value of each corresponding pixel in the selected image of that resolution level. The correction image is quantized and encoded. A decoder decodes the encoded quantized correction image for each resolution level. The value of each pixel in the decoded correction image having the lowest resolution is added to the value of each corresponding pixel of a reference image having the same resolution to form a result image of the lowest resolution. This result image is expanded to the next higher level of resolution. The value of each pixel in the expanded result image is added to the value of each corresponding pixel in the decoded correction image of the same resolution level to form a result image of that resolution level. This process is continued until each pixel value in the expanded result image, which has been expanded to full resolution, is added to the value of each corresponding pixel in the full resolution decoded correction image to form the final image which is stored in memory for subsequence display.


Inventors: Golin; Stuart J. (East Windsor, NJ)
Assignee: Intel Corporation (Santa Clara, CA)
Appl. No.: 657708
Filed: February 19, 1991

Related U.S. Patent Documents

Application NumberFiling DatePatent NumberIssue Date
408085Sep., 1989
104457Oct., 19874868653

Current U.S. Class: 375/240.23 ; 382/240; 382/302
Field of Search: 358/133,138,136,13 382/49


References Cited

U.S. Patent Documents
4268861 May 1981 Schreiber et al.
4523230 June 1985 Carlson et al.
4692806 September 1987 Anderson et al.
4709394 November 1987 Bessler et al.
4718104 January 1988 Anderson
4864396 September 1989 Martens
4941042 July 1990 Martens
5048111 September 1991 Jones et al.
5050230 September 1991 Jones et al.
Primary Examiner: Kostak; Victor R.
Attorney, Agent or Firm: Silverman; Carl L. Murray; William H.

Parent Case Text



CROSS REFERENCE TO RELATED APPLICATIONS

This application is a Continuation-In-Part of copending U.S. patent application Ser. No. 07/408,085, filed Sep. 15, 1989; which is a continuation of U.S. patent application Ser. No. 07/104,457, filed Oct. 5, 1987, now U.S. Pat. No. 4,868,653.
Claims



What is claimed is:

1. A method of encoding a digital motion video signal comprising the steps of:

a--selecting at least one frame from a sequence of frames of said digital motion video signal;

b--resolving each selected frame into at least one lower level of resolution;

c--encoding at least one lower resolution level of each of said selected frames; and

d--encoding each non-selected frame of said digital motion video signal using only the full resolution level.

2. The method in accordance with claim 1 wherein step c comprises encoding at least one lower resolution level but not encoding the full resolution level of each of said selected frames.

3. The method in accordance with claim 1 in which step c comprises, for each level of resolution of said selected frame which is encoded, the steps of:

i--forming a correction image comprising an array of pixel values, by subtracting pixel values in a reference image from corresponding pixel values in that level of resolution of said selected frame;

ii--quantizing the pixel values in said correction image to form a quantized correction image; and

iii--encoding said quantized correction image.

4. The method in accordance with claim 3 in which step i for the lowest level of resolution of said selected frame includes the step of providing a reference frame comprising an array of pixels all of which have the same value.

5. The method in accordance with claim 4 in which step i for each level of resolution higher than the lowest level of resolution of said selected frame includes the steps of:

(1) forming a result image having a lower level of resolution by adding the pixel values of the quantized correction image at that lower level of resolution to the pixel values of the reference image at that lower level of resolution; and

(2) forming the reference image at the higher level of resolution by expanding the result image formed in step (1) to said higher level of resolution.

6. The method in accordance with claim 3 wherein step iii comprises the steps of dividing said quantized correction image into regions and generating region parameters descriptive of the position, size and fill data for each region.

7. The method in accordance with claim 6 wherein the step of dividing said quantized correction image into regions comprises dividing said image into null and non-null regions or into null and linear fill regions.

8. The method in accordance with claim 7 wherein said region descriptions are encoded using binary tree decomposition.

9. The method in accordance with claim 8 in which said non-null regions are encoded using vector quantization or discrete cosine transform techniques.

10. The method in accordance with claim 9 wherein encoding using vector quantization comprises encoding using DYADS, QUADS, or DPCM.

11. The method in accordance with claim 1 in which step d comprises the steps of:

i--dividing each non-selected frame into regions and generating region parameters descriptive of the location and size of respective regions within said frame;

ii--for non-selected still frames or a first frame of a sequence of non-selected frames:

(1) determining absolute fill data describing respective regions;

(2) determining DPCM fill data describing respective regions for which absolute fill data cannot be determined;

(3) variable length encoding said absolute fill data, said DPCM fill data and corresponding region descriptive parameters;

(4) determining an estimate of decompression time from said variable length encoded absolute, DPCM fill data and corresponding region descriptive parameters;

(5) providing said variable length encoded fill data and corresponding region parameters if said estimate is less than a predetermined value, and if not, repeating steps (i) and (1)-(5) using a different tolerance in determining said absolute fill data; and

iii--for a succession of non-selected frames:

(1) comparing pixel values in each region with corresponding pixel values in a like sized region from a previously occurring frame to develop respective pixel differences;

(2) determining relative fill data descriptive of pixel differences for respective regions;

(3) determining DYAD fill data for respective regions in which relative fill data cannot be determined;

(4) determining DPCM fill data for respective regions in which relative fill data cannot be determined and for which DYAD fill data cannot be determined; and

(5) variable length coding said relative fill data, said DYAD fill data, said DPCM fill data and corresponding region parameters.

12. The method in accordance with claim 1 wherein step c comprises the steps of:

i--dividing each resolution level which is en

coded into null and non-null regions;

ii--encoding said regions using binary tree decomposition;

iii--determining vector values describing the pixel values in said non-null regions; and

iv--quantizing said vector values.

13. The method in accordance with claim 12 wherein said vector values include DYADS, QUADS and DPCM.

14. The method in accordance with claim 1 wherein step a comprises the steps of:

i--forming a motion compensated difference image for each frame of said digital motion video signal;

ii--determining the value of a selected parameter associated with said motion compensated difference image;

iii--selecting a threshold value of said selected parameter; and

iv--selecting those frames from the sequence of frames having parameter values which exceed said predetermined thresholds.

15. A method of encoding a digital motion video signal comprising the steps of:

a--selecting at least one frame from a sequence of frames of said digital motion video signal;

b--encoding each selected frame as a still image including the steps of:

(1) resolving each selected frame into at least one lower level of resolution; and

(2) encoding at least one of the lower levels of resolution but not encoding the full resolution level; and

c--encoding each non-selected frame of said digital motion video signal using at least the full level of resolution.

16. The method in accordance with claim 15 in which step b(2) comprises, for each level of resolution of said selected frame which is encoded, the steps of:

i--forming a correction image comprising an array of pixels, by subtracting pixel values in a reference image from corresponding pixel values in that level of resolution of said selected frame;

ii--quantizing the pixel values in said correction image to form a quantized correction image; and

iii--encoding said quantized correction image.

17. The method in accordance with claim 16 in which step i for the lowest level of resolution of said selected frame includes the step of providing a reference frame comprising an array of pixels all of which have the same value.

18. The method in accordance with claim 17 in which step i for each level of resolution higher than the lowest level of resolution of said selected frame includes the steps of:

(1) forming a result image having a lower level of resolution by adding the pixel values of the quantized correction image at that lower level of resolution to the pixel values of the reference image at that lower level of resolution; and

(2) forming the reference image at the higher level of resolution by expanding the result image formed in step (1) to said higher level of resolution.

19. The method in accordance with claim 16 wherein step iii comprises the steps of dividing said quantized correction image into regions and generating region parameters descriptive of the position, size and fill data for each region.

20. The method in accordance with claim 19 wherein the step of dividing said quantized correction image into regions comprises dividing said image into null and non-null regions or into null and linear fill regions.

21. The method in accordance with claim 18 wherein said region descriptions are encoded using binary tree decomposition.

22. The method in accordance with claim 19 in which said non-null regions are encoded using vector quantization or discrete cosine transform technique.

23. The method in accordance with claim 22 wherein encoding using vector quantization comprises encoding using DYADS, QUADS, or DPCM.

24. The method in accordance with claim 15 in which step c comprises the steps of:

i--dividing each non-selected frame into regions and generating region parameters descriptive of the location and size of respective regions within said frame;

ii--for non-selected still frames or a first frame of a sequence of non-selected frames:

(1) determining absolute fill data describing respective regions;

(2) determining DPCM fill data describing respective regions for which absolute fill data cannot be determined;

(3) variable length encoding said absolute fill data, said DPCM fill data and corresponding region descriptive parameters;

(4) determining an estimate of decompression time from said variable length encoded absolute, DPCM fill data and corresponding region descriptive parameters;

(5) providing said variable length encoded fill data and corresponding region parameters if said estimate is less than a predetermined value, and if not, repeating steps (i) and (1)-(5) using a different tolerance in determining said absolute fill data; and

iii--for a succession of non-selected frames:

(1) comparing pixel values in each region with corresponding pixel values in a like sized region from a previously occurring frame to develop respective pixel differences;

(2) determining relative fill data descriptive of pixel differences for respective regions;

(3) determining DYAD fill data for respective regions in which relative fill data cannot be determined;

(4) determining DPCM fill data for respective regions in which relative fill data cannot be determined and for which DYAD fill data cannot be determined; and

(5) variable length coding said relative fill data, said DYAD fill data, said DPCM fill data and corresponding region parameters.

25. The method in accordance with claim 15 wherein step b(2) comprises the steps of:

i--dividing each resolution level which is encoded into null and non-null regions;

ii--encoding said regions using binary tree decomposition;

iii--determining vector values describing the pixel values in said non-null regions; and

iv--quantizing said vector values.

26. The method of claim 25 wherein said vector values include DYADS, QUADS and DPCM.

27. The method of claim 15 wherein step a comprises the steps of:

i--forming a motion compensated difference image for each frame of said digital motion video signal;

ii--determining the value of a selected parameter associated with said motion compensated difference image;

iii--selecting a threshold value of said selected parameter; and

iv--selecting those frames from the sequence of frames having parameter values which exceed said predetermined thresholds.

28. An apparatus for encoding a digital motion video signal comprising:

a--means for selecting at least one frame from a sequence of frames of said digital motion video signal;

b--means for resolving each selected frame into at least one lower level of resolution;

c--means for encoding at least one lower resolution level of each of said selected frames; and

d--means for encoding each non-selected frame of said digital motion video signal using only the full resolution level.

29. A method of decoding a digital motion video signal in which at least one selected frame from a sequence of frames has been resolved into at least one lower level of resolution which is then encoded, and in which each non-selected frame has been encoded at the full resolution level only, said method comprising the steps of:

a--decoding each level of resolution of each selected frame to form a correction image comprising an array of pixel values having that level of resolution;

b--forming a result image at each level of resolution by adding the pixel values in a reference image having that level of resolution to the pixel values of the expansion of the correction image to the same level of resolution; and

c--decoding said non-selected frames.

30. An apparatus for decoding a digital motion video signal comprising:

a--means for applying a digital motion video signal in which at least one selected frame from a sequence of frames has been resolved into at least one lower level of resolution which is then encoded, and in which each non-selected frame has been encoded at the " full resolution level only;

b--means for decoding each level of resolution of each selected frame to form a correction image comprising an array of pixel values having that level of resolution;

c--means for forming a result image at each level of resolution by adding the pixel values in a reference image having that level of resolution to the pixel values of the expansion of the correction image to the same level of resolution; and

d--means for decoding said non-selected frames.
Description



FIELD OF THE INVENTION

This invention relates to video signal processing generally and particularly to systems for providing and decoding a compressed digital video signal representative of a full color video signal.

BACKGROUND OF THE INVENTION

The need for compression to facilitate recording of a digital video signal on relatively narrow-band media, such as a compact disc (CD), has been recognized. In a system proposed by Takahashi et al. in U.S. Pat. No. 4,520,401, a digital video signal is encoded using differential pulse code modulation (DPCM) for recording on a digital audio disc. In the known system, luminance (Y) and chrominance (R-Y, B-Y) components of a video frame are separately compressed using DPCM. A circuit divides the components into picture element data groups of a specific number of rows or columns which are adjacent on a screen. A header signal is provided having a synchronizing signal, a picture mode identification signal and a picture information quantity identification code. The header signal is added to the beginning position of each of the divided picture element data groups to produce a digital video output signal having a signal format in which the digital luminance, the two kinds of digital color difference signal and the header signal are time sequentially multiplexed and recorded.

In an example of the Takahashi et al. system still frames of digital video are recorded and updated at a rate of about four seconds per frame. The division of the compressed data into groups of lines with each group containing complete color information provides a psuedo-motion effect in that the line groups may be sequentially updated while displaying the previous frame thereby providing a partially moving picture.

The system described in the parent application Ser. No. 07/408,085 is directed to meeting the need for providing a compressed digital video signal representative of a full motion color video signal, which is suitable for recording or transmission using relatively narrow band media; and for decoding such a compressed signal to enable display of full motion video images at normal video frame rates. Although that system is effective for encoding and decoding most images, it was discovered that the system had some difficulty with certain images such as, for example, images containing a large amount of fast and/or uncoordinated motion; that is, motion which does not occur substantially uniformly in the same direction over relatively large portions of the image. It was found that such images were difficult to encode since the linear fill technique employed by the system was too gross an encoding method to be used with images containing substantial amounts of uncoordinated moving detail. With such images, it is desirable to encode each individual pixel; however, the system disclosed in the parent application is often allowed only an average of one-half bit for encoding each pixel. Therefore, it is desirable to add a feature to that system which will enable effective compression and decompression of selected images such as those having a substantial amount of moving detail.

SUMMARY OF THE INVENTION

The present invention is directed to meeting the need for a compression system for providing a compressed digital video signal representative of a full motion color video signal, which is suitable for recording or transmission using relatively narrow band media and which may be decompressed at speeds at least equal to conventional video frame rates. In a specific embodiment described herein over one-hour of recording time has been achieved for compact disc read only memory (CD-ROM) recording media for 30 frame-per-second full motion color digital video recording.

In accordance with an aspect of the invention, the first and second frames of a digital motion video signal are compressed using different compression methods and an output signal is formed including an identification code signifying each compression method.

In accordance with another aspect of the invention, each frame of a digital video signal is divided to form a plurality of regions and each region is separately analyzed and encoded by a selected one of several compression procedures to providing an optimum coding specific to the characteristics of the region being coded.

In accordance with another aspect of the invention, a digital motion video signal is compressed using compression thresholds controlled as a function of the number of bytes per frame and an estimate of the decoding time per frame of the compressed signal.

In accordance with yet another aspect of the invention, a video frame is split repeatedly to provide a plurality of regions to be individually encoded, and the split direction, vertical or horizontal, is determined by a comparison of distributions of pixel parameters associated with the regions.

In accordance with a further aspect of the present invention, at least one selected frame, also referred to as a selected image, of a digital video signal is encoded at multiple levels of resolution. For each level of resolution, a correction image is formed by subtracting the value of each pixel in a reference image of that resolution level from the value of each corresponding pixel in the selected image of that resolution level. The correction image is quantized and encoded. For each level of resolution, a result image is formed by adding the value of each pixel in the quantized correction image for that resolution level to the value of each corresponding pixel in the reference image having the same resolution level. The reference image of the lowest resolution level comprises an array of pixels all having the same value. The reference image of a higher resolution level is formed by expanding the result image of the next lower resolution level to that higher resolution level. A decoder decodes the encoded quantized correction image for each resolution level. The value of each pixel in the decoded correction image having the lowest resolution is added to the value of each corresponding pixel of a reference image having the same resolution to form a result image of the lowest resolution. This result image is expanded to the next higher level of resolution. The value of each pixel in the expanded result image is added to the value of each corresponding pixel in the decoded correction image of the same resolution level to form a result image of that resolution level. This process is continued until each pixel value in the expanded result image, which has been expanded to full resolution, is added to the value of each corresponding pixel in the full resolution decoded correction image to form the final image which is stored in memory for subsequent display.

BRIEF DESCRIPTION OF THE DRAWING

The foregoing and further features of the invention are shown in the accompanying drawing in which like elements are denoted by like reference designators and in which:

FIG. 1 is a block diagram of a digital video interactive system embodying the invention providing recording and reproduction of full-motion video, multi-channel digital audio and auxiliary (e.g., interactive) data using a compact disc read-only memory (CD-ROM) as the recording media;

FIG. 2 is a block diagram of a digital video encoder used in a recording portion of the system of FIG. 1;

FIGS. 3-9 are diagrams illustrating digital video signal formats at various stages of processing in the encoder of FIG. 2;

FIGS. 10-12 are diagrams illustrating two methods of processing "oversized" frames in the encoder of FIG. 2;

FIG. 13 is a block diagram of a formatter providing padding and dithering for use in the encoder of FIG. 2;

FIG. 14 is a block diagram of a pre-compression processor used in the encoder of FIG. 2

FIG. 15 is a block diagram illustrating details of a portion of the processor of FIG. 14;

FIG. 16 is a block diagram of a digital video compressor used in the encoder of FIG. 2, providing intra-frame and inter-frame region-specific coding, quantization by region area and frame-segmented variable length coding;

FIG. 17 is a flow chart illustrating operation of an intra-frame coder used in the compressor of FIG. 16 for compressing still video frames and the first frame of a motion video sequence;

FIG. 18 is a region diagram illustrating image edge analysis used in the compressor of FIG. 16;

FIG. 19 is a block diagram of a roughness estimator providing split/fill decisions for use in the compressor of FIG. 16;

FIGS. 20-23 are region diagrams illustrating bilinear absolute fill coding used in the compressor of FIG. 16;

FIG. 24 is a region diagram illustrating measurement of boundary errors;

FIG. 25 is a block diagram of an audio compressor used in the encoder of FIG. 2;

FIG. 26 is a diagram illustrating quad-tree regionalization;

FIG. 27 is a diagram illustrating binary tree regionalization of an image in the compressor of FIG. 16

FIGS. 28 and 29 are examples of split/fill coding diagrams for the regionalized image of FIG. 27;

FIGS. 30 and 31 are examples of "tree" code for the coding diagrams of FIGS. 28 and 29, respectively;

FIGS. 32A-J are region diagrams illustrating edge distribution analysis for determining a most favorable region split direction;

FIG. 33A is a flow chart for computer apparatus or determining a most favorable split direction in the compressor of FIG. 16 by analysis of the distribution of horizontal and vertical edges in a region;

FIG. 33B is a table listing of parameters for the apparatus of FIG. 33A;

FIGS. 34A and 34B are diagrams illustrating two forms of region splitting in the compressor of FIG. 16;

FIGS. 35A-35E are diagrams illustrating weighted median filtering in the compressor of FIG. 16;

FIGS. 36A-36C are diagrams illustrating non-linear low-pass filtering for use in the encoder of FIG. 16;

FIG. 37 is a diagram illustrating finding a most favorable split direction by polynomial fit comparisons;

FIG. 38 is a flow chart for computer apparatus implementing the split direction method of FIG. 37;

FIG. 39 is a flow chart illustrating operation of an inter-frame coder used in the compressor of FIG. 16 for coding the second frame and all subsequent frames of a motion video sequence;

FIG. 40 is a diagram illustrating region translation in the inter-frame coder of FIG. 39;

FIGS. 41 and 42 are vector and flow chart diagrams, respectively, illustrating selection of a best region search direction in the inter-frame coder of FIG. 39;

FIG. 43 is a diagram illustrating region translation and relative coding used in the inter-frame coder of FIG. 39;

FIG. 44 is a table illustrating region area dependent adaptive quantization used in the compressor of FIG. 16;

FIG. 45 is a flow chart illustrating operation of the apparatus in FIG. 16 providing area dependent quantization of FIG. 44;

FIG. 46 is a block diagram of a stream segmented variable length coder for use in the compressor of FIG. 16;

FIG. 47 is a diagram illustrating the format of data "streams" provided by the compressor of FIG. 16;

FIG. 48 is block diagram of a compressed digital video signal decoder used in the playback system 8 of FIG. 1;

FIGS. 49, 50 and 51 are examples of table listings of data stored in a region location memory of the decoder of FIG. 48 for absolute, relative, dyad and DPCM coded regions of FIG. 48;

FIG. 52 is a block diagram illustrating a memory organization for use in the decoder of FIG. 48;

FIG. 53 is a diagram illustrating relative region decoding of an inter-frame coded region by the decoder of FIG. 48;

FIG. 54 is a block diagram of apparatus providing the relative decoding of FIG. 53;

FIG. 55 is a diagram illustrating absolute region decoding in the decoder of FIG. 48 of an intra-frame coded region;

FIG. 56 is block diagram of apparatus providing the absolute decoding of FIG. 55;

FIG. 57 is a diagram illustrating DPCM decoding of a region in the decoder of FIG. 48;

FIG. 58 is block diagram of apparatus providing the region DPCM decoding of FIG. 57;

FIG. 59 is a table listing of area dependent adaptive quantization values for "dequantizing" pixel data in the decoder of FIG. 48;

FIG. 60 is a block diagram of apparatus for providing area dependent dequantization in the decoder of FIG. 48;

FIGS. 61 and 62 are diagrams illustrating dyad decoding in the decoder of FIG. 48; and

FIG. 63 is a block diagram of a dyad decoder for use in the decoder of FIG. 48;

FIGS. 64A and 64B depict a block diagram of an encoding method for selected images in accordance with the present invention.

FIG. 65 depicts an example of a quantized correction image divided into null and non-null regions.

FIG. 66 depicts a region table describing the regions of the exemplary quantized correction image depicted in FIG. 65.

FIG. 67 is a block diagram of a decoding method for selected images in accordance with the present invention.

DETAILED DESCRIPTION

The digital video interactive system of FIG. 1 comprises a recording system 6 and a playback system 8. The recording system includes sources 10, 12 and 14 which provide, respectively, a multi-channel sound signal S1, a color motion video signal S2 and an auxiliary data signal S3. An encoder 16 encodes and combines signals S1, S2 and S3 to form a digital recording signal S4 (hereinafter, "bit-stream") that is recorded on a compact disc read-only memory (CD-ROM) disc 20 by means of a CD-ROM recorder 18. Auxiliary data signal S3 may comprise interactive data associated with the video or audio signals or some other type of digital data which may be independent of the audio or video data.

The average data rate of the bit-stream S4 is controlled by a selection of encoding parameters to equal the standard CD-ROM record/playback bit-rate of about 1.2 mega-bits per second. The parameters are selected, as will be explained, so as to enable recording of up to one hour of full-motion digitally encoded color video, multi-channel digital audio and auxiliary data on CD-ROM disc 20.

The encoding of the digital full-motion color video portion of the recording signal to meet the relatively low channel capacity of the CD-ROM disc player requires very substantial data reduction. In a specific example to be described, this data reduction is on the order of about 150:1 for an exemplary video frame rate of 30 FPS (frames per second). To meet this critical requirement, while avoiding visible "artifacts" associated with conventional video compression techniques, encoder 16 converts the video signal S2 to a color frame sequential component form and separately subjects each frame of each component to a number of specially adapted processes as will be described. Briefly listed, these include variable sub-sampling, variable inter-frame and intra-frame compression employing what will herein be termed "region-specific" encoding, area dependent adaptive quantization, "segmented" variable length coding, reverse frame sequence reformatting, padding and frame dithering.

The selection of the individual processes, the selection of the share of data reduction provided by each and the selection of variable compression parameters (e.g., thresholds, operating modes and, particularly, when to quit compressing) represents critical choices in meeting the objective of encoding full motion color video for storage on CD-ROM digital audio tape (DAT) or other bandwidth limited media. Such choices depend on more than merely the channel capacity of the CD-ROM media. They depend as well on variables such as the video frame rate, the desired spatial resolution, certain specific characteristics of the video image content and on parameters of the decoder that is ultimately used for reconstituting the image. As will be explained, each individual video frame is converted to a component form and each component is divided to form a number of regions of picture elements ("pixels") or blocks of pixels. Each region is then individually "custom" encoded. This process is hereafter referred to as "region-specific" coding. The coding for each region is selected from a group of codes to enable the video decoder in playback system 8 to meet the strict requirement of completing all decoding tasks assigned to it in "real time", that is, within one video frame interval (a variable).

The foregoing and other aspects of recording system 6 are discussed in detail with reference to FIGS. 2-47, 61, 62 and 64-66. Details of playback system 8 are discussed later with reference to FIGS. 48-63 and 67.

Encoder 16, in FIG. 2, includes input terminals 202, 204 and 206 for receiving audio signal S1 from source 10, video signal S2 from source 12 and auxiliary data signal S3 from source 14, respectively. As an overview of the audio processing, signal S1 is subjected to channel selection and analog-to-digital (A/D) conversion, compressed with provisions for preventing frame-to-frame propagation of errors and stored for later recovery as blocks of audio data to be included in each video frame of bit stream S4 thereby providing audio/video synchronization.

In detail, audio signal S1 is applied to a channel selector and digital-to-analog (A/D) converter unit 208 which includes operator controls (not shown) for selecting the number of channels to be encoded and the channel sampling rate. One channel is selected for monophonic recording, two for stereo, four for stereo/bilingual, etc. The sampling rate currently used for high quality audio recording is 31.25 KHz which supports a 15 KHz audio bandwidth. The rate may be halved for standard quality or quartered for voice grade audio applications.

The data rate of the digitized audio signal S5 is reduced for recording by means of an adaptive differential pulse code modulation (ADPCM) encoder 210 which encodes the sample-to-sample differences of signal S5 to form a compressed digital audio signal S6. Since successive audio samples are often highly correlated, fewer bits are required to encode the sample differences. The term "adaptive" means that the encoder is of a type that changes the bit significance of encoded differences as a function of the previous encoded difference so as to provide fine resolution over a wide dynamic range.

Encoder 210 may be of conventional design but it is highly desirable for purposes of overall audio/video coding that provision be made either to bypass or reset it on a periodic basis so as to periodically encode an audio sample with full resolution. Illustratively, encoder 210 (FIG. 25) is reset once every 256 bytes. Recall that the audio signal is ultimately organized in a block form with one block of audio data included with each block of video data in bit stream S4. The formation of audio data "blocks" is supported via buffer store 212 which stores signal S6. Later the formatter 250 recovers the stored signal (S7) periodically on a frame-by-frame basis when the audio and video data are combined as will be explained. Typical audio block sizes currently used are 130 and 134 bytes for a video frame rate of 30 FPS and voice grade audio. The audio block size depends on the sampling rate, the number of audio channels to be recorded, and audio dithering within the formatter 250.

One reason for periodically resetting or bypassing DPCM encoder 210 is to prevent audio errors, which may occur in the CD-ROM transmission system, from propagating from frame-to-frame. This feature also facilitates subsequent editing of sequences to enable any frame to be chosen as an edit point. This feature is implemented as, shown in FIG. 25 by means of a comparator 214 which supplies a reset signal to reset input R of audio A DPCM encoder 21 when the byte count of the compressed audio signal S6 (produced by a byte counter 216) exceeds the byte limit set by a byte limit source 218.

VIDEO CODING OVERVIEW

The principal elements providing video encoding in FIG. 2 comprise a pre-compression processor 220, a digital video compressor 230 and an output signal formatter 250 which are described herein in detail with reference to FIGS. 3-47 and 64-66. As an overview, processor 220 provides conversion of video signal S2 to a non-standard format that provides a variable amount of data reduction, facilitates subsequent compression and contributes to certain features of the system relating to variable frame-rate processing for controlling spatial-temporal resolution. Some images are converted at one frame rate for subsequent display at an entirely different rate.

Compressor 230 employs, broadly speaking, four types of processing for reducing the quantity of digital data to encode a frame to a specific "optimum" value. This value is related to the CD-ROM channel capacity but varies as a function of several variables including the frame rate, the desired spatial-temporal resolution, and other factors relating to error propagation and visual appearance. The processing "types" include intra-frame region-specific coding for still frames and for the first frame and selected frames of a motion video sequence. Inter-frame region-specific coding is used for the second and subsequent frames of a motion video sequence. Encoded frames are subjected to further data reduction by two processes in compressor 230 which will be referred to herein as "area dependent adaptive quantization" and "segmented stream variable length coding". These processes are applied to each video frame to reach the desired "optimum" value noted above. Some sequences of frames may be repeatedly compressed with a change of compression thresholds to reach the optimum compression value.

From time to time, an "impossible" frame may be encountered which is hopelessly oversized and can not be reduced to the desired byte count by altering compression parameters without introducing noticeable visual artifacts. Such oversized frames receive special treatment in formatter 250 which combines the audio, video, auxiliary (e.g., interactive) and other data to create the recording bit-stream signal S4. Specifically, formatter 250 analyzes frames backwards from the last frame to the first and "borrows" space from short frames to hold the extra data of the oversized frames. Other functions provided by formatter 250 include adding "padding" data to undersized frames and dithering the number of bytes of data per frame to arrive at a specific average frame rate selected to keep the CD-ROM system operating at its maximum channel capacity and to avoid pauses during playback. Pauses are avoided because the recovery time (the "seek mode latency") of a CD-ROM player can be lengthy and unpredictable.

From time to time it may be desirable to select an image for special treatment in order to enhance the quality of the displayed video. An example of such an image is one which contains a substantial amount of fast and/or uncoordinated motion; that is, motion which does not occur substantially uniformly in the same direction over relatively large portions of the image. This image, hereinafter referred to as a selected image, is encoded at multiple levels of resolution as will be hereinafter described. Images may be selected based upon qualitative criteria such as, for example, the visual quality of the decoded digital motion video signal. Images may also be selected based upon quantitative criteria such as, for example when the mean square difference MSD exceeds a predetermined threshold value. Other quantitative criteria which are useable in selecting images for special treatment in accordance with the present invention include, without limitation, when the peak value of the difference images exceeds a predetermined threshold value or when the mean absolute value of the difference image exceeds a predetermined threshold.

Details of video processing are discussed in the following five sections entitled "Video Pre-Compression-Processing", "Video Compression Processing", "Post-Compression Processing", "Playback System", and "Video Decoding".

Video Pre-Compression Processing

Pre-compression processor 220 is coupled to input terminal 204 (in FIG. 2) for converting the standard video signal S2 to a non-standard form specially adapted for the particular types of compression and formatting functions subsequently employed in encoder 16. Specifically, each frame of the video S2 is converted in the "pre-compression" processor 220 to form three separate component frames comprising one luminance sub-frame and a pair of color-difference signal sub-frames. Each of the color-difference sub-frames is sub-sampled by a predetermined amount with respect to the luminance sub-frame which, itself, may or may not be sub-sampled with respect to the original video frame. The original video signal may be analog or digital and may be of component form, composite form or of another suitable form such as multiplexed analog component (MAC) form.

FIGS. 3, 4 and 5 illustrate the pre-compression processing of one frame of video signal S2 for the case where signal S2 is assumed to be an NTSC standard composite video signal, one frame of which is shown in FIG. 3. FIG. 4 illustrates an intermediate stage of precompression processing in which the composite signal has been decoded to RGB component form, stripped of synchronizing and blanking intervals and digitized to form RGB picture element (pixel) arrays representing them "active" video portion of each RGB field. The array dimensions, as illustrated, are 512 pixels horizontally by 240 pixels vertically for each RGB component.

FIG. 5 illustrates the final stage of pre-compression processing in which the digital RGB arrays of FIG. 4 have been converted to form a single luminance signal sub-frame (Y) measuring 256.times.240 pixels and two color difference signal sub-frames (I and Q) each measuring 64.times.60 pixels. The three sub-frames are stored in a memory (to be described) for subsequent individualized "custom" compression. Comparing FIGS. 3, 4 and 5 it is seen that one frame of signal S2 (FIG. 3) which requires 737,280 bytes in digital RGB form (FIG. 4) is reduced to 69120 bytes after sub-sampling, conversion and formatting (FIG. 5) thus providing an effective data reduction for the frame of a factor of about 11:1 for the assumed rate of 30 FPS.

An operator control unit 222 is provided in FIG. 2 for varying the sizes of the sub-frames of FIG. 5 as a function of the frame rate to facilitate varying the temporal and spatial resolution of encoded frames. This feature of the system relates to subsequent compression of the signals in the following way. The CD-ROM recording system can support a bit rate of about 1.2 mega-bits per second as previously noted. For 30 FPS (frame per second) video this channel capacity corresponds to a video byte count (8-bits/byte) of 5125.12 bytes per frame. Of this, typically about 4500 bytes per frame are available for video with the remainder being used for audio and other data. The video compressor (to be described) meets this requirement by compressing the formatted YIQ sub-frames by another factor of about 15:1 from 69120 to under 4500 bytes per frame for the assumed rate of 30 FPS. If the playback frame rate is halved then twice as much time (1/15th second) is available for decoding each frame and 9,000 bytes are available for encoding each frame. This increased decoding time and quantity of image data can be used in a variety of ways to provide improved image quality. One may for example, increase the number of pixels in the encoded frame or may more accurately encode the same number of pixels as at the higher frame rate (30 FPS).

FIG. 14 shows a specific implementation of precompression processor 220 for providing the variable sub-sampling and format conversion functions previously described. Processor 220 comprises an RGB decoder 1402 which converts the composite video signal to RGB component form. The RGB components are applied via anti-aliasing (2MHz) low-pass filters (1404, 1406 and 1408) to inputs of a programmable graphics workstation 1410. A suitable workstation is the "Adage 3000 Color Raster Display System". Operator control unit 222 of FIG. 2 comprises a terminal unit 222' (in FIG. 14) which supplies a "skip list" of fields, lines and pixels to workstation 1410 as well as anti-alias filter coefficients and sample rate control data. Data reduced subframes of Y, I and Q samples are produced by the work station and stored in a disc store 1412.

FIG. 15 is a block diagram illustrating the specific programmed configuration of workstation 1410 for use in processing video signal S2 to create the non-standard sub-frame signal format of FIG. 5. The anti-alias filtered analog RGB signals provided by filters 1404-1408 are applied to respective analog-to-digital converters 1502-1506 which digitize the signals at a rate selected to provide 512 pixels per active line interval as controlled by terminal 222' coupled to the workstation timing and control unit 1530. The digitized RGB signals (FIG. 4) are sub-sampled by two banks of switches 1510 and 1514. Switches 1510 are timed by unit 1530 to skip alternate fields of the RGB signals. Switches 1514 skip alternate pixels, so that the resultant digitized and sub-sampled RGB signals each comprise arrays of 256.times.240 pixels per frame.

A matrix 1516 converts the sub-sampled RGB signals to YIQ form. The I and Q color difference signals are each sub-sampled 4:1 both vertically and horizontally with respect to the luminance signal Y. This is provided by horizontal anti-alias low-pass filters 1518 (500 KHz), vertical anti-alias low-pass filters 1520 (60 lines/picture height), switches 1522 which skip 3 of 4 lines and switches 1524 which skip 3 of 4 pixels. The formatted Y, I and Q sub-frame signals (FIG. 5) are then stored in respective sub-frame locations in the disc store (e.g., a hard disc drive) 1412 for subsequent recovery and compression.

As previously explained, the filtering and sub-sampling parameters are variables which depend on the frame rate. For the specific examples of FIGS. 14 and 15 the frame rate is assumed to be 30 FPS. At different frame rates the operator inputs appropriate anti-alias filter coefficients, skip lists and conversion frequencies to timing and control unit 1530 via terminal 222'. At any frame rate or resolution, however, it is important that the original signal, of whatever form (analog or digital, component, composite or MAC), be converted as shown in FIG. 5 to a form comprising a luminance component Y and a pair of color-difference components that are filtered and sub-sampled both vertically and horizontally with respect to the luminance component. Color difference components I and Q are used as examples herein. Alternatively, the color components may be of other forms, such as R-Y and B-Y or U and V.

In the preferred embodiment of the present invention, which additionally employs multi-resolution encoding of selected images, pre-compression processor 220 of FIG. 14 is modified for processing a video input signal of MAC format by replacing RGB decoder 1408 with a MAC decoder providing YUV line sequential to YUV line parallel outputs, deleting the RGB/YIQ matrix in FIG. 15 and changing the sub-sampling parameters as needed to arrive at the individual (separated) sub-frames of luminance and color-difference components of FIG. 5. It will be appreciated that other variations are possible. For example, the source may be decoded to YIQ or YUV component form prior to filtering. Sampling may be done on either RGB or YIQ.

Video Compression Processing

After pre-compression processing the Y, I and Q video sub-frames are recovered one at a time from disc store 1412 for independent compression. The sequential recovery of sub-frames is indicated symbolically in FIG. 2 by sub-frame selector switch 224. In the position shown, switch 224 applies all Y sub-frames of a motion video sequence to compressor 230 which compresses and stores the complete sub-frames in a buffer store 232. Switch 224 is then advanced and the compression process is repeated for all of the I sub-frames of the sequence. Finally, compression is applied to all of the Q sub-frames of the sequence thereby completing an initial stage of compression of a sequence of color frames. Alternatively, switch 224 may be advanced to select the Y, I and Q sub-frames of one complete frame of the sequence for compression before advancing to the next frame of a sequence.

The compressed signal S9, as shown in FIG. 7, includes the three individually compressed sub-frames, each of which consists of a bitstream header (H) followed by the compressed data for the sub-frame (Y, I, or Q). The header identifies which sub-frame the data corresponds to, the size (number of pixels horizontally and vertically) of the sub-frame, a checksum for diagnostic purposes, and various tables used by the decoder. Further details of the format of signal S9 are discussed later with reference to FIGS. 46 and 47. The compressed data of FIG. 7 will hereafter be referred to as a video data "stream".

The feature of compressor 230 of individually compressing the YIQ sub-frames to form the compressed digital video "stream" S9 greatly enhances the compression efficiency. One reason is that even though the sub-frames represent the same image, they can differ from one-another dramatically because they represent different color measures of the image. Some images, for example, may contain no flesh tones. Others may contain no blue-green tones. Others may contain no color at all. A further reason for individual sub-frame compression relates to the statistical distribution of codes representing the image. Variable length coding is employed as one compression step. Variable length codes are selected in accordance with the frequency distribution or statistics of data to be coded. Since the statistics of Y, I and Q encoded sub-frames differ, individual variable length codes are employed that are optimized for each sub-frame. There are, in fact a number of separate statistical codes for each sub-frame as will be discussed.

After compression, the compressed video streams (S10) are recovered from buffer store 232 and applied to a byte count monitor 234 and to a decode time monitor 236 which identify, respectively, the number of data bytes and the decoding time for each individual frame of a video sequence. Since audio and auxiliary data will be added to each frame, the average byte count should be less than the total number of bytes allowed per frame in the bit stream S4. For encoding a video signal for playback at 30 FPS from a CD-ROM, the average number of bytes available per frame is 5125.12. This is determined by dividing the CD-ROM channel capacity by the video frame rate. Monitor 234 provides an accumulated average byte count over a sequence of video frames (alternatively monitor 234 may be arranged to count bytes on a frame-by-frame basis). This count is used for setting compression thresholds in a compression threshold control unit 238 to maintain the average byte count of signal S10 below 4500 bytes per frame. This allows room in the frame for audio and other data that is later added. Dashed lines are used to signify this closed loop procedure which is presently performed manually in a current implementation of encoder 16.

As previously noted, oversized video frames that can not be reduced to 4500 bytes are accounted for during reformatting by borrowing space from an earlier frame. The mechanics of this are discussed later, in the section on video post compression. Decode time monitor 236 measures the time it takes to decompress each sub-frame of the compressed digital video signal S10. This measurement may be accomplished by applying the signal S10 to a decoder such as processor 30 of the playback system 8 and measuring the processor decode time. For an exemplary playback rate of 30 FPS, the decode time of a frame should be no more than 1/30th of a second. When this monitor detects a larger decode time, thresholds in the threshold control 238 are adjusted to reduce the decode time of the "oversized" frame.

Alternatively, threshold 238 can be adjusted to merely keep the running average of the decode time below 1/30th of a second. With such a strategy, there is no need to repeat a compression, even if it exceeds the allowed decode time. In other words, the average can still be acceptable even if individual frames are not. As will be described subsequently, the playback system can cope with such temporary excesses in the decode time, without any effect on the playback rate, by using a technique of borrowing decode time from "short" frames (i.e., those frames that require less than 1/30th of a second to decode). This alternative technique of coding "oversized" frames applies where the average decode time is less than 1/30th of a second, and the playback system has adequate buffer storage. The amount of buffer storage needed by the playback system is monitored within the formatter 250 (FIG. 2), and if it is excessive, the threshold control is adjusted to reduce the decode time further. This alternative strategy for using the decode time monitor is desirable, because it permits a more accurate encoding of those frames that need a long decode time.

The decode time monitor may alternatively comprise an estimator, based on the known decoding time characteristics of the video processor 30. A careful examination of the decode process will reveal that it consists of a fixed number of well defined operations (say "A", "B", etc.) each of which requires a maximum length of time to complete. The encoder has available to it the precise bit stream that will be processed by the decoder. Hence the encoder can determine precisely how many times each of these operations will be performed for each sub-frame. The decode time estimate, T, is simply the sum of products: ##EQU1##

In the summation, each term "A.sub.i " represents the total number of times a particular decoding action is performed. The term K.sub.i represents the maximum decoding time of the action. Examples of such actions include relative, absolute, DPCM, dyad and quad decoding. Moreover, each decoding action may comprise several actions depending on where the pixel is in the region being decoded. To facilitate the use of such an estimator, the digital video compressor 230 stores the A.sub.i counts associated with each sub-frame in the buffer-store 232. They are retrieved by means of a connection (not shown) from monitors 234 and 236 to store 232. As an example of the use of equation 1 for estimating decoding time, the products that may be summed are (1) the number of regions described by respective fill data times respective first constants, (2) the number of pixels included in each type of region times respective second constants and (3) the number of rows of pixels included in respective types of regions times respective third constants. A constant term may be added to the sum of products to account for decoding steps common to all regions to be decoded.

FIG. 16 is a simplified block diagram of digital video compressor 230 which includes an input terminal 1602 for receiving the YIQ selected sub-frame signal S8 from switch 224 and another input 1604 for receiving the threshold control signal S11 from control 238. Mode switch 240 of FIG. 2 is indicated symbolically as switch 240' in FIG. 16. In the position shown (UP), mode switch 240' applies the video sub-frame signal S8 to an intra-frame region-specific coder 1610 which produces region-specific coded signal S12 that is applied via mode switch 240' to an area dependent adaptive quantizer 1630. The quantized region coded signal S14 is applied to a stream-segmented variable length coder 1640 as the final compression step in producing the compressed signal S9 for storage in buffer 232 (FIG. 2). Reversing the position of switch 240' applies the video input signal S8 to an inter-frame region-specific coder 1620 and selects the inter-frame coded signal S13 for quantization. Both encoders 1610 and 1620 are coupled to receive the threshold control signal S11.

In operation, mode switch 240' is placed in the UP position for encoding still frames and the first frame and selected frames of a motion video sequence using intra-frame coder 1610. Briefly stated, coder 1610 splits the frame into a number of small groups of similar pixels referred to herein as "regions". For each region a code is produced for representing the values of all pixels of the region. This technique provides very substantial data reduction (compression) because very few bytes of code are needed to specify where a region is, how big it is and what "fill" values are to be used to represent the region pixels. Further, the specific coding method used for each region is optimally chosen based on detailed characteristics of each region. This technique (herein, "region-specific" coding) of tailoring the encoding strategy, not just to individual images, but actually to individual regions within an image, greatly increases the amount of compression possible. Details of (1) how to find the regions, (2) how to code or "fill" the region (3) how to identify "good" and "bad" fill values and (4) what to do about "bad" fills are shown and described with reference to FIGS. 17-38, 65 and 66.

Switch 240, is placed in the down position for encoding the second frame and all subsequent, non-selected frames of a motion video sequence using inter-frame coder 1620. This different coding mode is used because once the first frame is encoded by coder 1610, the second and later frames can be coded on a "relative" basis using differences of the regions from frame-to-frame. One advantage of this "relative" coding of region differences is that smaller numbers are produced and smaller numbers can be represented using fewer bits by means of variable length coding in which shorter codes are assigned to smaller numbers. Details of (1) how to find the best direction to look for corresponding regions in a previous frame, (2) how to encode the region if found and (3) what to do if a corresponding region does not exist are discussed with reference to FIGS. 39-43 and 61, 62.

The region-specific coded signals S12 and S13 are subjected to what is termed herein as "area dependent" adaptive quantization in quantizer 1630 which provides further data reduction. Recall that frames are coded a regions of pixels. The size of each region varies with details of the overall image. For example, in areas of high detail there will be many small regions of a few pixels each. Conversely, in areas of low detail there will be a smaller number of regions but these regions will contain tens or even hundreds of pixels each. Quantizer 1630 achieves data reduction by variably quantizing region data as a function of the region area (i.e., the number of pixels in the region) such that small regions are more coarsely quantized (and thus require fewer bits) than larger regions. This process, and the psycho-visual effect that makes the quantization essentially invisible, will be discussed with reference to FIGS. 44 and 45.

The quantized region-specific coded signal S14 receives additional data reduction (compression) in variable length coder 1640. Briefly, the data describing an image is rather complex. It includes data describing how the regions were split and filled, how regions were shifted, parameters describing the fill values in terms of bi-linear polynomial coefficients and further data in DPCM, dyad and quad coded form. The point is that each video stream includes many types of data. These different types of data are formatted to occur in separate "segments" of each video stream. Coder 1640 determines the statistical occurrence of data for each individual segment of a video stream and assigns the shortest code to the most frequently occurring data within each segment. This is done independently for each one of the Y, I and Q sub-frames comprising a stream. In a preferred application, the different forms of region-specific codes are biased, so to speak, towards zero so that small numbers have a higher frequency of occurrence than larger numbers and thus are assigned shorter variable length codes by coder 1640. Details of the foregoing "stream segmented" variable length coding are described with reference to FIGS. 46 and 47.

Compressor 230 of FIG. 16 has been implemented by programming a digital computer as described with reference to FIGS. 17-47. For the computer, a model VAX 11/785 manufactured by Digital Equipment Corporation was selected. Compression speeds of a few minutes per frame have been achieved for typical motion video sequences. The principal goal of compressor 230 is not speed but rather is high quality for the images that are ultimately displayed. This goal is achieved in large part through the use of what is herein termed "region-specific" coding as will now be described.

Region specific coding comprises two actions, namely, (1) dividing the image into several regions ("regionalization"), and (2) selecting "optimal" fill parameters for each region. These two actions are performed concurrently, as will be described with reference to FIG. 17.

FIGS. 27-31 provide an overview of the regionalization process called binary tree decomposition. In this simplified example, the region 2702 consists of four subregions (2704, 2706, 2708, 2710) in which the pixels are assumed to have uniform gray levels (e.g., 141, 112, 90 and 98 out of a possible range of 256 gray levels). The pixel value distribution of this sub-frame is atypical, and is only intended to illustrate how binary tree regionalization is applied, and how the resulting decomposition can be efficiently encoded. In the more general case, the "fill" (i.e., the code representing the region pixel values) is described by the linear expression Ax+By+C, where the coefficient "A" represents the slope or brightness gradient in the horizontal (X) direction, "B" represents the gradient in the vertical (Y) direction and "C" represents a constant or uniform level of brightness over the region. In the example of FIG. 27, the terms A and B of the fill polynomial Ax+By+C are both zero.

Binary tree decomposition is performed by splitting a region in half, then possible splitting each of the resulting sub-regions in half, until the resulting sub-regions can be efficiently encoded. Later, in the discussion of FIG. 17, a number of strategies are described for deciding when a sub-region should be split, and in which direction would be split, horizontal or vertical. For FIG. 27, these decisions are easy. The first split, labeled split 1 in FIG. 27 splits the region horizontally into two equal halves. The top half 2704 can be efficiently encoded by the single value 141, while bottom half needs further decomposition. A further vertical split, split 2 divides the remaining area in half. The right half (2706) can be efficiently encoded by the value 112 and hence is not split any further. The left half, however, requires a further horizontal split, into two subregions 2708 and 2710 which can be efficiently encoded by the values 90 and 98.

Other regionalization strategies are possible. For example in quad-tree decomposition, instead of picking a single split direction, both split directions are used together. This leads to a regionalization as shown in FIG. 26 where region 2602 is split to from four more regions 2604-2608 one of which (2608) is further split to form four regions. Binary tree regionalization is the preferred mode because it has been found to normally result in fewer regions and hence fewer bits and less decode time.

FIGS. 28 and 30 illustrate the encoding of the absolute fill values and region locations of the example of FIG. 27. The term "absolute" as used herein signifies fill values obtained solely from the region data of the region being coded. The term "relative" as used herein signifies region fill values based upon frame-to-frame region differences. The inverted tree=like structure of the coding diagram 2802 in FIG. 28 represents successive divisions of region 2702 and is called a "binary tree" because each branch is split to form two branches. The top node of the tree represents the whole image. Each time a region is split, two new node values are formed. Terminal nodes of the tree are encoded with the region fill values.

The code (FIG. 30) to describe the complete tree consists, therefore, of two types of data: "values," which are the fill values, and "actions," which are the split or fill commands. The "actions" and "values" are encoded using the same code "space". That is, they each comprise variable-length- encoded non-negative numbers. It is always possible, however, to distinguish between an action and a value based on context, that is, the position of the action or value in the code sequence. For instance, in the example of FIG. 28, when a "fill" action is encountered, the next number must be a value. The next item after this value must be another action, etc.

In more detail, the tree description data is ordered using the following rule. For each node that is split, all the data pertaining to the "top" node (if a horizontal split) or the "left" node (if a vertical split) is listed, followed by all the data for the other node. This is an inherently recursive procedure that begins with the root node of the tree and operates successively on nodes of the tree until all terminal nodes of the tree are reached. For the example tree in FIG. 28 this yields the tree code shown in FIG. 30. This short code, together with the dimensions of the original image, gives all the information one needs to specify the size and location of every region and the value of every pixel in the image 2702. The "H" and "V" symbols signify horizontal and vertical splits. The "F" symbol signifies a fill action.

FIG. 29 illustrates an alternative and preferred format for encoding the binary tree data for the regions of FIG. 27. It differs from the method of FIG. 28 in that the fill data is encoded as node differences rather than as the actual values of the end nodes. This requires calculation in the decoder to recover the actual fill values but has an advantage in that the encoded values are numerically smaller. Compare, for example, the values 141, 90, 98, 112 of FIG. 30 with the values -7, -37, 18, and 8 of FIG. 31. Since the values are encoded using a variable-length code, this produces greater coding efficiency, since this weights the statistics of the values more heavily towards small numbers.

The coding procedure which results in the binary tree illustrated in FIG. 29 is performed as follows. First, the encoding process which develops the binary tree of FIG. 28 is performed. Next, pairs of fill values at terminal nodes from the same branch point are differenced and averaged. The difference value is assigned to the branch point and is the value which will subsequently be encoded in the tree description. The average value is also assigned to the branch point, but only for the purpose of determining other nodal or branch values working backwards up the tree. That is, the average values are averaged and differenced with absolute or average values from a corresponding node on a parallel branch. The difference value is assigned to the branch point as the value to be encoded, and the new average value is used to determine the next difference and average value working hierarchically up the tree. Differences are determined by subtracting the left nodal or branch value from the right nodal or branch value.

In the example illustrated the terminal nodal value 90 is subtracted from the terminal value 98 to produce the difference value +8 which is assigned to the branch point designated "split 3". The average of the nodal values (90+98)/2=94 is also applied to the branch point and shown in angle brackets. The average value 94 at the branch point "split 3" is differenced and averaged with the terminal nodal value 112 to generate the difference +18 and average 103 which are assigned to the branch point designated "split 2". This process is carried out all the way up the tree until the firstmost branch point is reached.

A further encoding efficiency is accomplished at the top node of the tree by referencing the top node to the value 128. That is, the value 128 is subtracted from the average value established for the top node. In this example, the average value for the top node or branch is 122. Subtracting 128 from 122 yields a value of -6. This value is assigned the first position in the encoded tree description.

The tree description is illustrated in line "A" FIG. 31 and includes in order of occurrence the value -6 followed by the direction "H" of the first split, followed by the difference value assigned the first branch, followed by the instruction to fill the left branch, followed by the direction "V" of the next split, followed by the difference value assigned that branch point etc. This code contains the same number of instructions as FIG. 30 but has smaller numerical values.

For decoding, the average value of the first two nodes (141 and branch point "split 2") is calculated by adding -6 to +128 to yield 122 which equals (R+L)/2 where R and L are the right and left node or average values respectively. The difference value, -38, transmitted in the code is equal to (R-L) i.e., R-L=-37. But (R+L)/2=122. Solving these equations simultaneously yields the left nodal value, L, equal to 141 and the right branch average value R equal to 103. This process is continued down the tree. Occasionally, the averaging process described above may require dividing an odd number by 2. This may be dealt with by having the encoder and decoder agree on the same truncation or rounding strategy.

The foregoing binary tree encoding methods require encoding negative numbers. This is accomplished in the following way: A positive (or zero) number P is encoded by the positive number 2P, and a negative number N is encoded by the positive number 2 N -1. Positive and negative numbers are differentiated because all positive values (2P) are even and all negative values 2 N -1 are odd. This technique avoids placing a sign bit in the most significant bit position of fixed bitwidth codewords and therefore eliminates extra bits between the sign bit and value bits for small values. When using this coding scheme, the tree code assumes the values in line "B" of code in FIG. 31.

As a further efficiency measure, it has been found useful to encode the "actions" and the "values" using different variable-length codes. Since there are only a few different actions, and many more possible values, their statistics are significantly different. Thus, using separate variable-length codes produces some additional code savings.

The above description applies specifically to images containing absolute fills by constants. In actuality, there are five types of fills currently used, namely: absolute, relative, DPCM, dyad and quad. Each of these has its own separate action code. The node values discussed above only apply to absolute fills. The fill values for the other four types of fills are encoded separately in different code "segments" that are later combined with the split/fill segment to form the overall video bit-stream. The use of many code segments is described in a subsequent discussion of "segmented stream variable length coding" and FIGS. 46 and 47.

Vertical splits, V, and horizontal splits H, have approximately equal probabilities of occurrence. An alternative way of encoding this information has been found that uses fewer bits on average. It has been found that most splits tend to split the longer dimension (e.g., regions 3402 and 3404 in FIG. 34A). Such a split is called a simple split and is encoded as S. If the dimensions of a region are equal and it is to be split horizontally, it is coded as a simple split, S. This encoding is not ambiguous to the decoder because the region dimensions are available and if they are equal, the split code S is interpreted as a horizontal split. Any split which is not a "simple" split is called an alternate split and is encoded as A (e.g., regions 3406 and 3408 as shown in FIG. 34B). Because of the greater probability of occurrence of simple splits the variable length encoder is able to use fewer bits on average by assigning a shorter code to represent simple splits. With this encoding strategy, the tree of FIG. 29 would be encoded via line "C" of FIG. 31. While this approach does decrease the code size, it has the disadvantage that the decoding time is increased by the need to deduce vertical and horizontal split actions (V and H) from the simple and alternative split codes (S and A).

For images containing relative or dyad coded regions (described later), the region shift values (X.sub.o, Y.sub.o) are also encoded in the split/fill tree description, using another action (called "shift") followed by the two shift values. As will be explained, a "shift" value is a measure of the horizontal (X.sub.o) and vertical (Y.sub.o) offset between a region of a given frame and a corresponding region of a previous frame. The shift is a measure of frame-to-frame motion of a region. These values are encoded in the tree description, rather than separately, for further efficiency of coding. Since many regions tend to have the same X.sub.o, Y.sub.o values, the "shift" action is defined to mean "apply these X.sub.o, Y.sub.o values to this node and all child nodes of this node". Advantageously, this permits the shift values for regions having the same shift to only be encoded once.

FIG. 17 and FIGS. 18-38 related thereto provide the details of intra-frame coder 1610 that encodes all "still" frames and the first frame of a motion video sequence. FIG. 17 is a flow chart illustrating each step in the encoding process provided by coder 1610. This "software" implementation of coder 1610 is presently preferred. However, it will be appreciated that the individual processing functions may readily be implemented by individual elements of apparatus providing the functions shown in the flow chart. Specific examples of such "hardware" implementations are included in FIGS. 18-38.

The first step for intra-frame coding (FIG. 17) comprises the START step (1702) and is initiated by placing mode switch 240' in the UP position (FIG. 16) for still frames or the first frame of a motion sequence. Simultaneously, switch 224 (FIG. 2) is placed to select the Y sub-frame. All of the Y sub-frames will be compressed before advancing switch 224 to select the I and finally the Q sub-frames.

As a brief overview, FIG. 17 has four main actions. Prefiltering occurs in step 1703. Sub-region stacking and selection is provided by steps 1730, 1732, and 1704. This is the process (to be described) by which the same strategy can be applied to every sub-region regardless of its size. Linear fill encoding, provided by steps 1706 to 1716, determines whether a region is suitable for encoding as a plane surface (Ax+By+C), and if so, what the details of the encoding should be. DPCM encoding, provided by 1722 and 1724, are used for regions that are not suitable for linear fill encoding. Step 1720 performs post-processing on the resulting encoding to further reduce code size and decode time. Processing provided by steps 1734 and 1736 check for the end of the sequence of still frames or the end of the first frame of a motion sequence.

The first action in FIG. 17 is to apply filter 1703 to the "image" of video signal S8. Filtering removes extraneous detail which improves the speed of the compression process, it decreases code size, and decreases the decode time because larger regions tend to be produced. Since simple low pass filters also tend to blur the image, nonlinear filters are preferred that remove low amplitude noise but preserve high amplitude information. There are many kinds of filters that can be used for this purpose, a preferred form being a cascade connection of a weighted median filter and a modified linear low pass filter. The modification is described subsequently in reference to FIG. 36.

FIGS. 35A-E illustrate the weighted median filter. FIG. 35A illustrates a pixel 3502 to be filtered and its eight nearest neighbors. FIG. 35B shows an array 3504 of weighting factors for filtering pixel 3502 to produce the weighted value (12) for this one pixel 3506 indicated in FIG. 35C. The weighting method is shown in FIG. 35D. First, the values of pixel 3502 and of its eight neighbors are listed in ascending order (3503). The un-weighted median is seem to have the value of "11" units. One half of the values are higher and one half are lower. The weighting values (3504) from FIG. 35B are listed beneath the ordered values 3503. They determine the number of times each value is repeated to form an ordered list 3508. In the example, the four corner pixels (11, 9, 1, 17) have weights of unity and are listed once in list 3508. The center side pixels (12, 5, 10, 13) have weights of 2 and so are listed twice in list 3508. The central pixel (15) to be filtered has a weight of 5 and so is listed 5 times in list 3508. The weighted median value (12) is the value taken from list 3508 for which half the weighted values are less and half are greater. This value (12) is the filtered value of the central pixel of the region 3503 as shown (3506) in FIG. 35C. The remaining pixels are determined the same way by applying the weighting array 3504 to each pixel and its 8 near neighbors.

The weights of FIG. 35B were selected for purposes of illustration to keep list 3508 reasonably short. Exemplary weights for an average scene are listed in FIG. 35E which shows corner weighting of 2, mid-side weighting of 4 and center pixel weighting of 13. One may vary these weights to achieve controlled directional spatial detail redirection while preserving edge transitions. One may, for example, change diagonal contributions to the filtered value by changing the corner weights. Vertical and horizontal contributions are determined, respectively, by the values of the top and bottom or the left and right weights. Accordingly, the weighted median, in addition to preserving edges due to being a median filter, can exhibit selective directional characteristics due to the weighting factors.

FIG. 36A illustrates a modified low-pass filter suitable for use in the filtering step 1703 which removes unimportant detail while preserving edge transitions. The filter comprises the combination of a linear transversal filter 3602 and a modifier 3620 (both outlined in phantom). Briefly, the modifier detects edges and generates a "damping factor D" which is used to selectively mix the low pass filter input and output signals as a function of the edge amplitude to thereby suppress small changes while preserving larger signal transitions. Filter 3602 comprises a cascade connection of pixel delay elements 3604 and 3606 which delay an input signal at input terminal 3608 by one and two pixel periods. An adder 3610 produces a low pass filtered signal by forming a weighted sum of the input signal (weight=1/4), the pixel delayed signal (weight=1/2) and the two-pixel delayed signal (weight=1/4). Modifier 3620 includes a subtractor 3622 which detects transitions by subtracting the low pass filtered signal of adder 3610 from the unfiltered one-pixel delayed input signal provided by delay 3604. The output of subtractor 3622 is applied to a non-linear detector 3624 which produces complementary control signals D and 1-D for controlling multipliers 3626 and 3628, respectively, which multiply the filtered and un-filtered pixel delayed signals. An adder 3630 adds the multiplied signals to provide an output signal at terminal 3632. Detector 3624 may be a ROM programmed to output the values D and (1-D) responsive to the differences from subtracter 3622 applied as addresses.

FIG. 36B illustrates the non-linear characteristic of detector 3624 for producing control signal D (hereafter, the damping factor) and 1-D as a function of the subtractor 3622 output (difference signal). For small differences characteristic of small detail features of an image the factor D is near unity. Accordingly, multiplier 3626 selects the filtered signal of adder 3610 as the output. For larger transitions the value of D decreases and so signal 1-D increases causing multiplier 3628 to select more of the unfiltered signal as the output. For very large transitions (D near zero) filter 3602 is essentially bypassed thereby faithfully preserving the full amplitude of large edges. This is further illustrated in FIG. 36C in which 3640 indicates the occurrence of a step transition for the input pixels represented by open circles. Dashed line 3642 illustrates the response of a conventional low pass filter which, as shown, tends to smooth both large and small pixel variations. The solid circles indicate the response of the modified filter of FIG. 36A. The damping factor D is low for pixels approaching and leaving the transition zone whereby small pixel variation (detail) are filtered. The damping factor is low in the transition zone thereby bypassing the filter and thus preserving the steep transition.

Returning now to FIG. 17, steps 1704, 1730 and 1732 select and list regions for subsequent analysis. This process has one of two possible effects. It may yield an encoding of the region via step 1716 or 1724, and hence removal of the region from further analysis. Or it may cause the current region to be split 1726, and both halves put on a list of regions for further examination. Each split reduces the size of the region. When the region gets small enough it encounters the test for a minimum size region, 1722. This test prevents unlimited splitting, and hence forces eventual encoding of every region.

Initially, the region selection step 1704 treats the entire image sub-frame as one single region. During this processing, it is likely that a split 1726-1732 will occur, resulting in two subregions that need to be processed. Boxes 1730 and 1732 "push" two regions onto a list of regions waiting to be removed by 1704. By "push" it is meant that the region identities (locations) are stored in the region list. The next time select region 1704 is used, the top region on the list is encoded as will be described. The order in which regions are processed is determined by the order in which they are placed on this list. For a horizontal split (1732) the bottom half region and the top half region are each added to the list and the top half is first to be encoded. For a vertical split 1730 the right half region and the left half region are added to the list with the left half region being first to be encoded. This orderly sequencing of how regions are examined is known to the video processor 30 (FIG. 1), and is used by it during decoding to interpret the sequence of codes used to represent each image.

Linear fill encoding is provided by steps 1704-1716 as will now be described. It will be recalled that region-specific coding gets its strength from the ability to choose optimal encoding strategies for each individual region. Linear fill encoding is tried first, since it can describe a large region with very few bits. If linear fill encoding is not possible, the region is split (1726) and linear fill encoding is again tried for each sub-region. As we shall see, the number of bits required to encode a region using linear fill techniques does not increase as the size of the region increases, so it is an excellent encoding strategy for large regions. Only when the resulting subregions fall below a minimum size (TEST 1722) is another encoding technique used.

A mean square error measure (MSE) is one method used to determine whether or not linear fill encoding is acceptable (1714). Since this measure is an average over the entire region, there may be localized portions of the regions where the deviation from a plane surface is quite large and visually apparent, yet the MSE may be acceptably low. To avoid this problem a roughness estimator 1706 is applied to the region before attempting linear fill coding (1710). If the region fails this test (1708) and is not of a minimum size (test 1722), it is split (1726-1732) and the same processing is applied to the resulting sub-regions so formed.

Roughness of a region in this example is determined by detecting edges in the region. FIG. 18 illustrates a simple definition of edges, based on large changes in gray level between adjacent pixels. FIG. 19 is a block diagram of apparatus providing edge detection.

In FIG. 18 a region 1802 is shown comprising four rows and four columns of pixels. Luminance (Y) signal values are indicated for the 16 pixels. By definition, an edge exists between adjacent pixels whose values differ by more than a threshold value (input via threshold control 238). A typical threshold value may be 25 units for a Y signal quantized to 8-bits (i.e., a 256 level scale from black to peak white). Using a level of 10 brightness units as an exemplary edge threshold, it is seen that there are two vertical edges (V) and three horizontal edges (H) in FIG. 18.

If region 1802 were "split" (i.e., divided) horizontally between rows 2 and 3, the result would be two regions neither of which contains a horizontal edge. Notice also that the pixels of rows 3 and 4 range only from 3 to 5 in brightness, which is less than the edge threshold. Thus, horizontal splitting of region 1802 provides two regions which have no horizontal edges and one region (rows 3 and 4) which may be encoded with a "fill" value of "4" that fairly represents the Y signal value for all eight pixels. Rows 1 and 2, however, still contain vertical edges V. By splitting this region vertically between columns 2 and 3 two more regions are formed and neither contains edges. The 4.times.4 region containing the uniform pixels "23" can be filled with a single value. The 4.times.4 region having pixel values 1, 3, 9 and 12 has no horizontal or vertical edges but is not "fillable" with a single value because of the presence of a pronounced "gradient". Filling of such a region requires a plane surface fill via 1710.

The fill procedure begins at step 1710 by using the method of least squares to find the coefficients A, B and C of the bi-linear polynomial (Ax+By+C) estimate of the region pixel values. Boundary error and MSE error measurements are made (1711) and tests 1712 and 1714 are performed to determine acceptability of the fill value.

If a linear fill is not acceptable, because of the results of any of the tests 1708, 1712 or 1714, then the next step is usually to split (1726) the region. However, the test at 1722 prevents splits if the region size is already small. This is done for two reasons. First, the code size for linear fill encoding is nominally independent of the region area. However, once the region falls below some predetermined size, other encoding methods require fewer bits. Second, there can be delays in the decoder (FIG. 48) whenever a new region must be decoded. If the image were represented using a large number of relatively small regions, these delays can become sufficiently significant to interfere with the requirement that images be decoded at a rate such as 30 FPS.

When the minimum size test indicates a minimum sized region, 1724 encodes the region in DPCM (Differential Pulse Code Modulation) format. In this encoding method, the difference between every pixel and its left neighbor is transmitted. However, since it does not have a left neighbor the first pixel of each line of the region is transmitted as the difference between itself and the pixel immediately above it. The first pixel of the first line of the region (which has no pixel to its left or above it) is transmitted as the difference between itself and a mid-gray value, namely 128. The resulting differences may be additionally data reduced by passing them through a nonlinear quantizer. For decoding purposes, a table describing the nonlinear quantization levels may be transmitted to the decoder in the header part of the compressed video bit-stream.

A number of DPCM quantizers may be used. This is practical because region-specific coding enables matching the coding technique to the individual region. These quantizer tables differ in the dynamic range of the differences. The DPCM encoder 1724 examines the statistics of each region and decides which quantizer table is better suited to that region, and generates a code specifying which dequantizing table is to be used in decoding it.

FIG. 19 shows apparatus for providing the roughness test. FIGS. 32 and 33 which are described later, show apparatus for determining the split direction. In FIG. 19 the region data is stored in a memory 1902. Subtractors 1904 and 1906 subtract the region pixels by row and by column, respectively. Threshold detectors 1908 and 1910 compare the differences of pixels with a threshold value Th (e.g., 10 is assumed) to detect the horizontal and vertical edges which, in turn, are counted by counters 1912 and 1914 and stored in an edge memory 1916. The stored edge data is applied to a zero detector 1920. A HIGH output of detector 1920 signifies that there are no horizontal or vertical edges in the region and initiates the process of finding a value (or values) to fill the region. If edges are present, the edge data in memory 1916 is applied to split logic circuit (FIG. 33A) for finding a split direction as described later.

Alternative definitions of roughness are also possible. For example, one can estimate the slope between adjacent pixels by multipoint interpolation techniques. If the slope is larger than a threshold and is not constant over the region, then the surface is rough.

Returning to FIG. 17, it will be assumed that test 1708 finds no edges present in the region. This initiates the process of finding a fill value for representing all pixels of the region as a group. This is done in step 1710 by generating coefficients A, B and C of a bilinear polynomial (Ax+By+C) estimation of pixel values for the region using the method of "least squares" estimation. The estimated pixel values are compared with the actual values for all pixels of the region to determine the closeness or "fit" of the estimate. The "fill" value comprises the coefficients A, B and C of the polynomial that satisfies two tests, namely, a "boundary error" test 1712 and a "mean square error" test 1714.

FIGS. 20-24 show in detail how the polynominal fill values are found and how the two tests for acceptability of the fill are performed. FIG. 20 represents the most elementary case where all pixels of region 2002 are of the same value (5 units). There is no brightness gradient in the horizontal ("x") direction therefore the coefficient "A" which signifies the horizontal brightness gradient or "slope" equals zero. There is no brightness gradient in the vertical direction either. Therefore, the coefficient "B" representing vertical slope is also zero. The only coefficient remaining is "C", which is the polynominal coefficient representing the constant or uniform signal level of 5 units. The code to represent this simple case is shown as ABS 0 0 5 to signify what will be called absolute coding hereinafter to distinguish region codes based on the actual signal values from region codes based on frame-to-frame differences (hereinafter relative codes). Decoding of region 2002 comprises assigning a value of 5 to every pixel in the region.

In FIG. 21 the region 2102 includes a horizontal brightness gradient of one unit per pixel in the x direction. Starting in the upper left hand corner the values are 4, 5 and 6. The fill polynomial Ax+By+C therefore has coefficients A=1, B=0, C=4 (taking the upper left pixel as a reference level). The code is therefore ABS 1 0 4. This is decoded by assigning a value 4 to the upper left hand pixel and adding a gradient correction to each horizontal pixel of one unit of brightness per pixel. Since there is no vertical gradient, successive rows are replicas of the first row. FIG. 22 is similar except that the gradient is vertical rather than horizontal.

In FIG. 23 the region 2302 has both horizontal and vertical gradients. Taking the upper left corner pixel as a reference, the polynomial constant C equals 5, the brightness increases by 1 unit per pixel in the x direction and changes by -1 unit in the y direction. The code is therefore ABS 1 -1 5. Decoding is effected by assigning a value of 5 to the first pixel and incrementing its value by one unit per pixel horizontally. The second and third rows are similarly decoded after decrementing the starting pixel value by the vertical slope value (minus one pixel per column).

The above examples suggest that the slope values A and B in the polynomial Ax+By+C are always integers. It has been found, however, that most slopes that occur in real images are not integers, and in fact are usually less than 1 in absolute value. The A and B values are, therefore, specified in units of 1/256ths; i.e., binary numbers with the least significant 8 bits representing the fractional part of the slope.

In FIGS. 20-23 the polynomial coding is exact. That is, for the exemplary values given, it just happens that upon decoding the decoded regions will have exactly the same values as the original regions. In practice this ideal situation may not occur very often. For this reason measures are needed to determine if the bi-linear polynomial fill values produce a reasonably close replica of the actual pixels values when the region is ultimately decoded. The tests used are the mean square error (MSE) and the boundary error test of the polynomial fit as illustrated in FIG. 24.

FIG. 24 illustrates a specific case where the polynomial fill is not exact and acceptability of the fit is tested. Region 2402 is a region of pixel values as they appear in the image. Array 2404 is a corresponding set of values that is produced when using a polynomial of the form Ax+By+C, the coefficients of which were determined using least squares analysis on the data of region 2402. Array 2404 shows a uniform horizontal gradient of 1 and a uniform vertical gradient of 1. Array 2406 is a set of values corresponding to the errors between the actual pixel values and the corresponding generated pixel values. The MSE is obtained by taking the square root of the average value of the squares of the values in array 2406. For this specific example the MSE is 1. This value is compared with a threshold value to determine acceptance or rejection of the fill data.

The boundary error is based on analysis of the 12 pixels that constitute the boundary of this region. It has been found that boundary errors require tighter tolerances than errors interior to a region if false edges are not to be generated between abutting regions. One possible boundary test is to compare each of the boundary difference values in array 2406 against a predetermined threshold value, e.g., 10, and if any of the differences exceed this threshold, to reject the coefficients.

A preferred embodiment of the boundary test looks for coherence in values. It has been discovered that boundary errors are more visible when they are coherent; that is when adjacent pixels have errors with the same sign. Random differences such as those along the top, bottom and left side of array 2406 are unlikely to produce a false edge in a reproduced image. In the preferred embodiment, the boundary estimator 1711 identifies contiguous blocks of boundary errors which have the same sign. Only boundary pixels that are part of a block whose length is greater than a threshold (from threshold control 238), typically 2, may be considered. For example, in array 2406 of FIG. 24, only the block of error values having the value +1 on the right boundary would be considered, a boundary error estimator of "1" generated. The average block error value of such coherent pixels is compared against a threshold value. If the error exceeds the threshold value the fill is rejected.

In summary, tests at 1712 and 1714 are performed to see whether the fit represented by Ax+By+C should be accepted. The test at 1714 might fail because the average deviation from a plane surface is too high. In other words, the MSE test essentially measures closeness of the fit of the encoded pixel values (Ax+By+C) to the actual pixel values. An MSE threshold is selected and used as an input for threshold control 238, and is typically 4. The test at 1712 might fail if the errors along the boundary might tend to introduce a visible transition between adjacent regions when they are decoded and displayed. boundary threshold is also used as an input for threshold control 238 and is typically 20.

Returning to FIG. 17, once the decision has been made to split a region, the region is analyzed to find the best split direction. If test 1728 indicates the need for a vertical split, step 1730 splits the region into a left half and right half region. If test 1728 indicates the need for a horizontal split, step 1732 splits the region into a top half and a bottom half. If the split is horizontal, the compression process is repeated starting with the next region selected (1704) being the upper one of the split regions. If the split is vertical, the compression process is repeated selecting (step 1704) the left one of the split regions. This process of splitting and compressing continues until all the regions created by the splitting process are encoded (step 1705). Then remerge (step 1720 to be described) is done and the intra-frame compression operation ends (1736) for the Luminance (Y) signal subframe. A complete color frame is encoded by repeating the compression process for the remaining I and Q sub-frames. If additional still frames are to be encoded, the next frame is selected (1735) as a result of a "last frame" test 1734 and the process repeats.

Finding a split direction for a region to be split (1726 of FIG. 17) may be accomplished by means of: (1) edge distribution analysis; or (2) polynomial fit analysis. Each of these procedures, and specific apparatus for providing the split direction indication are described as follows with reference to FIGS. 32-38.

Edge distribution analysis is used to find a most favorable split direction for cases where the reason for splitting the region is the presence of edges in the region (e.g., failure of the roughness test 1708 in FIG. 17). FIGS. 32A-J provide examples of regions to be split using edge distribution analysis. FIGS. 33A and B, discussed later, show how the analysis may be implemented.

FIGS. 32A-E illustrate five cases where a vertical split is favored over a horizontal split. In FIG. 32A the region 3202 contains two vertical edges which lie on the vertical bisector of the region. Nothing would be gained by splitting this region horizontally since each subregion would still contain an edge whereas a vertical split will produce two regions (1, 3 and 2, 4) neither of which contains an edge. In FIG. 32B there are no edges in the right half of region 3204 therefore a vertical split will produce one sub-region having no edges. A vertical split is similarly favored in region 3206 of FIG. 32C. In FIG. 32D there are many vertical edges in region 3208. Although edge free regions will not be produced here, a vertical split is favored since it is more likely that an edge will be eliminated than if it were split horizontally. In FIG. 32E region 3210 contains several horizontal and vertical edges with no clear advantage. For this case aspect ratio analysis is used. Specifically, region 3210 is wider than its height and a vertical split is selected because it will tend to produce sub-regions that are more square. On average, this has been found to result in fewer regions being produced for typical images where the edge analysis shows no clear advantage. Regions 3212-3220 of FIGS. 32F-J are split horizontally for reasons similar to those discussed with regard to vertical splitting of regions 3202-3210.

From the foregoing it is seen that there are a number of factors which have a bearing on selection of a split direction. In practice, the split decision is not as simple a matter as it might appear from the examples given because real images produce regions having many more edge distributions than the relatively simple examples of FIGS. 32A-J. The flow chart of FIG. 33A and associated table of FIG. 33B illustrate a method for finding a split direction which takes into account the complex edge distributions encountered in splitting regions of typical video images.

In FIG. 33A the split direction analysis starts (3302) with the step (3304) of detecting edges in four quadrants of the region to be split. As shown in FIGS. 32A and 32F, the quadrants are designated 1 for the top left, 2 for the top right, 3 for the bottom left and four for the bottom right. Next (step 3306) four functions V13, V24, H12 and H34 ar generated as listed in the table of FIG. 33B. The function V13 equals the sum of a number of terms (Col. 1 of FIG. 33B) relating to edges in the left half of the region (i.e., quadrants 1 and 3). The function V24 is comprised of the sum of a number of edge distribution terms for the right half (quadrants 2 and 4) of the region. The function H12 and H34 similarly relate to sums of terms for the top and bottom halves of the region. Specific terms are discussed later.

At step 3308 multiplication is performed to produce what is herein termed a vertical factor VF and a horizontal factor HF. The vertical factor VF equals the product of the region height (H) times V13 times V24. The horizontal factor HF equals the product of the region width (W) times H12 times H34. The factors HF and VF are compared at step 3310. A vertical split is made (3312) if VF is less than HF and the program ends (3316) otherwise the region is split horizontally (3314) and the program ends.

In operation, any factor which tends to make VF smaller than HF favors a vertical split. From step 3308, for example, if the edge analysis factors V13 and V24 ar equal to H12 and H34 than a vertical split will be favored if the region height H is less than its width. This condition corresponds to the example of FIG. 32E where it was seen that there was no clear advantage shown by edge analysis but the aspect ratio test favored a vertical split to obtain sub-regions that are more square.

The factors of FIG. 33B (calculated in generation step 3306) become important in cases where the vertical and horizontal edge distributions are not uniform in a region. Factors tending to make V13 (left half) or V24 (right half) small favor a vertical split. For example, if the number V1 of vertical edges in quadrant 1 is equal to the number V3 of vertical edges in quadrant 3, then the factor V1-V3 will be zero thus favoring a vertical split. Vertical and horizontal edge difference factors V1-V3, H1-H3, V2-V4, H2-H4 etc. are included for all quadrants. These terms are all squared in FIG. 33B to give them added weight. The terms Ho and Vo represent the number of horizontal and vertical edges, respectively, which will be eliminated by the split (i.e., edges falling on the split line). The elimination of edges has also been found important and so these terms are also squared to increase their weight in the split direction test. The remaining factors H1, H2, V1, V2 etc. represent the number of edges per quadrant. If, for example, there are many horizontal edges in the left and right halves a horizontal split will be favored (e.g., FIG. 32I). Other examples of the application of the table of FIG. 33B are apparent. For example, the terms may be weighted differently than shown. Also, the region may be more finely subdivided and additional terms for split direction analysis added to the table.

The foregoing assumed that the reason for a split was a YES exit from the roughness test 1708. If the reason for the split was a linear fill that failed the MSE test (1714) or the boundary error test (1712), then a different method for choosing the split direction is appropriate.

FIG. 37 illustrates analysis of the polynomial fit to determine the split direction. Recall from FIG. 17 that a region will always be split if either the mean square error (MSE) or the boundary error tests of the bi-linear polynomial Ax+By+C is unsatisfactory. In FIG. 37 the vertical portion (By+C) of the bi-linear polynomial is compared with the vertical Luminance values Y.sub.V (i.e., the average luma values by row). Also, the horizontal portion (Ax+C) is compared with the horizontal Luminance value Y.sub.H (i.e., the average luma value by column). The comparison providing the better "fit" (i.e., lower MSE) is selected as the split direction.

A flow chart for a computer apparatus implementation of this method is shown in FIG. 38. Measurements of the fit of Ax+C to Y.sub.H and By+C to Y.sub.V are made in steps 3802 and 3804. Test 3806 selects a vertical split 3810 if the vertical "fit" (i.e., MSE) is better than the horizontal fit as shown in the example of FIG. 37. Otherwise test 3806 selects a horizontal split 3808 and the program ends (done).

An advantage of this technique of finding a split direction is that it frequently results in a "fillable" region if most of the error occurs in one-half of the region. In the example of FIG. 37 the vertical fit is good and most of the error in the horizontal direction is on the right-hand side of the image. Thus, for this case a vertical split is favored and the left side of region 3702 will probably not require further splitting since the errors (horizontal) are mostly on the right-hand side.

Returning to FIG. 17, when all the regions have been encoded, the process continues at step 1720. Remerge examines the encoding generated for all regions of the image, and performs some post processing on it to remove some code and to improve the decoding time. If two adjacent regions of like size have been encoded with the same DPCM quantizing table, the split could have been avoided. Remerge (step 1720) will modify the coding stream to replace the two smaller regions by the larger region. The rejoined two regions can thereafter participate in a further rejoining operation with adjacent like sized DPCM encoded regions.

FIGS. 64-66 provide details of intra-frame decoder 1610 that encodes selected images. As previously stated, it may be desirable to select certain types of images, such as images which contain a large amount of fast and/or uncoordinated motion for special treatment in order to enhance the quality of the displayed video. Such images, referred to as selected images, are encoded at multiple levels of resolution. For each level of resolution, a correction image is formed by subtracting the value of each pixel in a reference image having that level of resolution from the value of each corresponding pixel in the selected image of that resolution level. The correction image of that resolution level is then quantized and encoded.

Referring to FIG. 64A and 64B, there is shown a block diagram depicting a preferred embodiment of the selected image encoding method in accordance with the present invention. The full resolution selected image, hereinafter referred to as the level 0 selected image 6402, is resolved into a selected image having a lower resolution, hereinafter referred to as the level 1 selected image 6404. The level 1 selected image 6404 is resolved into a still lower resolution selected image, hereinafter referred to as the level 2 selected image 6406. The level 2 selected image 6406 is resolved into yet a lower resolution selected image which, in a preferred embodiment, is the lowest resolution selected image, hereinafter referred to as the level 3 selected image 6408. In the preferred embodiment, the level 1 selected image 6404 has approximately 1/4 the number of pixels of the full resolution level 0 selected image 6402; the level 2 selected image 6406 has approximately 1/16 the number of pixels of the full resolution level 0 selected image 6402; and the level 3 selected image 6408 has approximately 1/64 the number of pixels of the full resolution level 0 selected image 6402. Although in the preferred embodiment the selected image is resolved into three lower resolution images, resolving the selected image into fewer or more levels of resolution is also within the scope and contemplation of the present invention. Accordingly, the present invention encompasses resolving the selected image into at least one lower level of resolution.

A level 3 correction image 6410 is formed by subtracting the value of each pixel in a level 3 reference image 6412 from the value of each corresponding pixel in the level 3 selected image 6408. The level 3 reference image 6412 comprises an array of pixels all having the same value. In the preferred embodiment, the range of possible pixel values in the selected images 6402, 6404, 6406 and 6408 is from 0 to 255. It is preferred that the value of the pixels in the level 3 reference image 6412 be midpoint in the total pixel value range. Accordingly, each pixel of the level 3 reference image 6412 has a value of 128. As a result, the values of the pixels in the level 3 correction image, hereinafter referred to as difference values D, are signed and range from -128 to +127. The difference value D of each pixel in the level 3 correction image 6410 is then quantized (6414) to form a level 3 quantized correction image 6416. The level 3 quantized correction image 6416 is encoded (6418) as will be subsequently described.

Alternatively, a different constant value of the pixels in the reference image of lowest resolution may be chosen based upon the actual pixel values in the selected image, and that chosen value transmitted in the bitstream. For example, the average value of the actual pixel values in the selected image may be chosen and transmitted in the bitstream.

A level 3 result image 6420 is formed by adding the value of each pixel in the level 3 quantized correction image 6416 to the value of each corresponding pixel in the level 3 reference image 6412. The level 3 result image 6420 is expanded to form an expanded level 3 result image 6422. The vacant pixel locations created by the expansion are filled by pixels whose values are preferably determined by linear interpolation. This is accomplished for example, by adding the values of the pixels on either side of the vacant pixel location, dividing the resultant sum by 2 then inserting the result in the location of the vacant pixel. It should be noted, however, that other methods of interpolation could be used and are considered to be within the scope of the present invention. For example, polynomial interpolation tends to reduce average sizes of pixels in the correction image thereby decreasing the bit rate. In the preferred embodiment, each lower level of resolution contains 1/4 the number of pixels of that of the higher resolution level. However, it should be noted that the number of pixels in each lower resolution level could be reduced by a factor such as two, three, or five, including non-intergal ratios and such is considered within the scope of the present invention. In cases such as these, polynomial interpolation may be especially useful.

A level 2 correction image 6426 is formed by subtracting the value of each pixel in a level 2 reference image which, in the preferred embodiment, is the expanded level 3 result image 6422, from the value of each corresponding pixel in the level 2 selected image 6406. Although the value of each pixel in the level 2 correction image 6426, which is also a signed difference value D could range from -255 to +255, it is preferred that values outside the range of from -128 to +127 are clipped thereby limiting the difference values D to a range of from -128 to +127. The difference values D are quantized (6430) to form a level 2 quantized correction image 6432. The level 2 quantized correction image 6432 is encoded (6434) as will be subsequently described.

A level 2 result image 6436 is formed by algebraically adding the signed difference value D of each pixel in the level 2 quantized correction image 6432 to the value of each corresponding pixel in the expanded level 3 result image 6422. The level 2 result image 6436 is expanded to form an expanded level 2 result image 6440. The vacant pixel locations in the expanded level 2 result image array are filled as previously described with respect to the expansion of the level 3 result image 6420. A level 1 correction image 6442 is formed by subtracting the value of each pixel in a level 1 reference image which, in the preferred embodiment, is the expanded level 2 result image 6440, from the value of each corresponding pixel in the level 1 selected image 6404. The value of each pixel in the level 1 correction image 6442, which is also a signed difference value D ranging, in the preferred embodiment, from -128 to +127, is quantized (6444) to form a level 1 quantized correction image 6446 which is then encoded (6448) as will be subsequently described.

A level 1 result image 6450 is formed by algebraically adding the signed difference value D of each pixel in the level 1 quantized correction image 6446 to the value of each corresponding pixel in the expanded level 2 result image 6440. The level 1 result image 6450 is expanded to form an expanded level 1 result image 6454. The vacant pixel locations in the expanded level 1 result image are filled as previously described with respect to the expansion of the level 3 result image 6420. A full resolution level 0 correction image 6456 is formed by subtracting the value of each pixel in a full resolution level 0 reference image which, in the preferred embodiment, is the expanded level 1 result image 6454, from the value of each corresponding pixel in the level 0 selected image 6402. The value of each pixel in the level 0 correction image 6456, which is also a signed difference value D ranging, in the preferred embodiment, from -128 to +127, is quantized (6458) to form a level 0 quantized correction image 6460 which is encoded (6462) as will be subsequently described.

In the description of the preferred embodiment set forth above, the quantized correction image for each resolution level is based upon the expanded result image of the next lower resolution level. However, the quantized correction image at a particular resolution level could be based upon a result image expanded from a level which is two or more levels of resolution lower and as such is considered within the scope of the present invention. For example, the level 3 result image could be expanded to the level 1 resolution. The level 1 correction image would then be formed by subtracting the value of each pixel in the expanded level 3 result image (which in this case is now the level 1 reference image) from the value of each corresponding pixel in the level 1 selected image.

Although, in the description set forth above, a full resolution correction image is formed, quantized and encoded, such processing of the image at full resolution may not be necessary or desirable to achieve a high quality result. Accordingly, in an alternative preferred embodiment of the present invention, the processing is performed at levels of resolution lower than full resolution. For example, a level 1 image is not formed and expanded to form a full resolution reference image. Nor is a level 0 correction image formed and quantized to form a full resolution quantized correction image to obtain a final image as will be subsequently described. Also in this alternate preferred embodiment, the selected image may be resolved to only two levels of resolution lower than full resolution. That is, the full resolution level 09 selected image is only resolved into the level 1 selected image 6404 and the level 2 selected image 6406. Consequently, level 2 is the lowest level of resolution in this embodiment. Accordingly, the level 2 reference image comprises an array of pixels all having the same value as previously described with respect to the level 3 reference image 6412.

The quantized correction images of the different resolution levels are encoded as follows. The quantized correction image for each level is divided into m.times.n pixel blocks which are grouped into null and non-null regions. In the preferred embodiment m=8 and n=8. A null region is a region in which all pixel values have been quantized to zero. In a non-null region, at least one pixel has a non-zero quantized value. The non-null regions are encoded using a code, type preferably a vector quantization code type such as, for example, quad, dyad or DPCM. * A vector represents one or more pixel values, one being a special case. For example monads, dyads and quads represent vectors of one, two and four pixels respectively. A vector can be applied directly, or could employ spacial prediction. For example, DPCM is used for monads, where each monad is predicted by the immediately preceding monad. In this context, DPCM is considered to be a special case of a vector quantization code type using monads.

It should be noted that the quantized correction image for each level could be divided into linear fill and non-null regions and such is considered within the scope of the present invention. Where the regions are so divided, a null region could be considered a special case of linear fill. It should be further noted that other techniques for still image encoding known in the art could be used; for example, the discrete cosine transform (DCT) technique; and such is also considered to be within the scope of the present invention. In the preferred embodiment, the code type used is explicitly set forth in the bit stream. Accordingly, the decoder recognizes the code type used in connection with each level of resolution by reading a code in the bit stream. This permits any appropriate code type to be used in encoding any level of resolution.

In the preferred embodiment, the quantized correction image having the