Home | Alpha Telephone | Domain Names | Web Hosting | Get Traffic | xrEvidence | xrSoccer

United States Patent

Previous       Show 10       Next


United States Patent 5,852,823
De Bonet December 22, 1998

Image classification and retrieval system using a query-by-example paradigm


Abstract

Apparatus and an accompanying method for generating a semantically based, linguistically searchable, numeric descriptor of a pre-defined group of input images and which is particularly useful in a system for automatically classifying individual images, on a numerical basis, in, e.g., an image database, and, through a query-by-example paradigm, retrieving a desired image(s) therefrom. Specifically, a signature is computed for each image in a set using multi-level iterative convolution filtering, with pixel values supplied as input to each filtering level being separately convolved with each one of a set of predefined Gaussian kernels. Average and variance vectors, as collective numeric descriptors of all the images in the set, are separately computed across corresponding elements in all the image signatures for the set. A linguistic term, semantically descriptive of all the images in the set, is associated with the numeric descriptors of this set and, with this set, is stored in a database. For image retrieval, the descriptor for any set is accessed by a textual search through the database using the appropriate linguistic term. The descriptor is then compared against accessed signatures for other images in the database in order to retrieve a image, among those stored in the database, that is the most similar to those in the set associated with the descriptor.


Inventors: De Bonet; Jeremy S. (Cambridge, MA)
Assignee: Microsoft (Redmond, WA)
Appl. No.: 732615
Filed: October 16, 1996

Current U.S. Class: 707/6; 382/162; 382/165; 382/181; 382/279; 707/104.1
Intern'l Class: G06F 017/30
Field of Search: 707/6,10,202,203,204,2,3,4,5,104 382/279,162,165,181,305


References Cited

U.S. Patent Documents
5493677Feb., 1996Balogh et al.395/600.
5546572Aug., 1996Seto et al.395/600.
5579471Nov., 1996Barber et al.395/326.
5581357Dec., 1996Sasaki et al.348/235.
5628003May., 1997Fujisawa et al.395/615.
5644765Jul., 1997Shimura et al.395/615.
5659626Aug., 1997Ort et al.382/125.
5675788Oct., 1997Husick et al.395/615.
5696964Dec., 1997Cox et al.395/605.
5729741Mar., 1998Liaguno et al.395/615.
Foreign Patent Documents
388 677Oct., 1989EP.


Other References

Kelly et al. "Efficiency Issues Related to Probability Density Function Comparison", SPIE vol. 2670, pp. 42-49 Jan. 1996.
Gong et al. "An Image Database System with Content Capturing and Fast Image Indexing Abilities" IEEE, 1994, pp. 121-130, May 1994.
Barber et al. "Ultimedia Manager: Query by Image Content and it's Applications" IEE, 1994, pp. 424-429, Jan. 1994.
Shann et al. "Detection of Circular Arcs for Content-Based Retrieval from an Image Database" IEE Proc.-Vis. Image Signal Process, vol. 141, No. 1, Feb. 1994, pp. 49-55.
Picard et al. "Finding Similar Patterns in Large Image Databases", IEEE, 1993, pp. 161-164, Apr. 1993.
Martinez et al. "Imagenet: A Global Distribution Database for Color Image Storage and Retrieval in Medical Imaging Systems" IEEE, 1992, 710-719, May 1992.
"The QBIC Project", IBM QBIC Almaden web site, home page (pp. 1-2); printed Sep. 11, 1996.
"A show and tell of the QBIC technology --Query By Image Content (QBIC)", IBM QBIC Almaden web site, pp. 1-4; printed Sep. 11, 1996.
"IBM Ultimedia Manager 1.1 and Clinet Search", IBM software web site, pp. 1-4; printed Sep. 12, 1996.
"Photobook", MIT Media Lab web site; Aug. 7, 1996; pp. 1-2.
"Four Eyes", MIT Media Lab web site; pp. 1-2; printed Sep. 12, 1996.
"Virage -- Visual Information Retrieval", Virage Incorporated, home page; printed Sep. 11, 1996.
"Workshop Report: NSF -- ARPA Workshop on Visual Information Management Systems", Virage Incorporated web. site; pp. 1-15; printed Sep. 11, 1996.
Corporate Overview, Virage Incorporated web site; pp. 1-4; printed Sep. 11, 1996.
"Virage Products", Virage Incorporated web site; pp. 1-2; printed Sep. 11, 1996.
"Frequently asked questions about visual information retrieval", Virage Incorporated web site; pp. 1-3; printed Sep. 11, 1996.
"Visual Information Retrieval: A Virage Perspective Revision 3", Virage Incorporated web site; 1995; pp. 1-13; printed Sep. 11, 1996.
T. Minka, "An Image Database Browser that Learns from User Interaction", Masters Thesis, Massachusetts Institute of Technology; 1996; also appears as MIT Media Laboratory Technical Report 365.
J. Wachman, "A Video Browser that Learns by Example", Masters Thesis, Massachusetts Institute of Technology; 1996; also appears as MIT Media Laboratory Technical Report No. 383.
C. Faloutsos et al, "Efficient and Effective Querying by Image Content", Journal of Intelligent Information Systems:Integrating Artificial Intelligence and Database Technologies, vol.3-4, No. 3, Jul. 1994, pp. 231-262.
R.W. Picard et al, "finding Similar Patterns in Large Image Databases", IEEE ICASSP, Minneapolis, Minnesota, vol. V, pp. 161-164, Apr. 1993; also appears in MIT Media Laboratory Technical Report No. 205.
W.T. Freeman et al, "The Design and Use of Steerable Filters", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, No. 9, Sep. 1991, pp. 891-906.

Primary Examiner: Black; Thomas G.
Assistant Examiner: Wallace; Michael J.
Attorney, Agent or Firm: Michaelson & Wallace, Michaelson; Peter L.

Claims



I claim:

1. A system, responsive to a query image set, for retrieving, from a stored database, a desired image that is visually similar to the image set, said set containing at least one query image, said system comprising:

an image database having stored images therein;

a signature generator, responsive to an input image and having three successive convolution filters, for producing a signature for the input image, the signature containing a numeric measure of each of a plurality of pre-defined characteristics of the input image; wherein the signature generator, in response to the query image set, generates a corresponding signature for each image in the query image set, and, in response to a test image accessed from the stored database, generates a corresponding test signature;

a statistics generator, responsive to the signature for each image in the query image set, for providing a separate pre-defined statistical measure, for each of the pre-defined characteristics, that represents variability of said each pre-defined characteristic across all images in the query image set;

a image database manager, operative in conjunction with the image database, for retrieving successive ones of a plurality of stored images from the database and for routing each one of the plurality of successive images as the test image to the signature generator so as to generate a plurality of corresponding test signatures;

a comparator for comparing, across all of the pre-defined characteristics, the statistical measure for each of the pre-defined characteristics against a corresponding numeric measure in each one of the test signatures so as to yield a similarity measure, the similarity measure for said one of the test images being reflective of a degree of visual similarity between the one test image and the query image set; and

a manager, responsive to the similarity measure for each of the test images and operative in conjunction with the image database manager, for selecting at least one of the test images having a highest relative similarity measure as a retrieved image and providing the retrieved image as output.

2. The system in claim 1 further comprising a path for feeding back the test signature of the retrieved image to the statistics generator such that the statistical measure for each of the pre-defined characteristics is responsive to the query image and the retrieved image.

3. The system in claim 1 wherein each of said filters comprises a convolver for convolving pixel values, of an incoming image and for a given one of a plurality of color axes in the incoming image, with each one of a plurality of predefined kernels to yield a corresponding filter response image for said color axis.

4. The system in claim 3 wherein the statistics generator produces, for each one of said pre-defined characteristics and for each color axis in the incoming image, a value of an average of the numeric measure of said one pre-defined characteristic across all the signatures in the query image set so as to yield an average vector, and, as said pre-defined statistical measure, a value of variance of the numeric measure of said one pre-defined characteristic across all the signatures in the query image set so as to yield a variance vector.

5. The system in claim 4 further comprising a path for feeding back the test signature of the retrieved image to the statistics generator such that the statistical measure for each of the pre-defined characteristics is responsive to the query image and the retrieved image.

6. The system in claim 4 wherein the comparator comprises a comparison engine for determining the similarity measure, for the test image, as a function of a difference between corresponding elements in said average vector and said test signature, and a corresponding element in said variance vector.

7. The system in claim 6 wherein the comparator comprises:

a subtractor for determining a difference between values in corresponding elements of said average vector and said text signature;

a squarer for squaring a magnitude of the difference so as to yield a squared difference value;

a divider for dividing said squared difference value by a corresponding element in the variance vector to yield an elemental variability value;

a summer for summing the elemental variability values across all of the predefined characteristics and for all of the color axes, to yield a difference measure; and

an inverter for determining the similarity measure for the test image as a reciprocal of the difference measure.

8. The system in claim 4 wherein the signature generator further comprises first, second and third serially connected convolution filters for iteratively convolving pixel values of said each color axis in the incoming image applied to said signature generator, wherein the first, second and third filters separately convolves the pixel values in the incoming image applied to the first, second and third filters with each one of a plurality of individual kernels in first, second and third sets of predefined kernels, respectively.

9. The system in claim 8 wherein each of the filters further comprises a squarer, responsive to the filter response image, for squaring a magnitude of each pixel value produced by the convolver in said each filter so as to yield a filter response image.

10. The system in claim 9 wherein the signature generator further comprises a sum and normalizer for summing all the pixel values in the filter response image produced by the third filter to yield a resultant sum and for normalizing the resultant sum by a total number of pixels in the filter response image produced by the third filter so as to yield a value of a corresponding one of the plurality of pre-defined image characteristics.

11. The system in claim 10 wherein the first, second and third sets of kernels are identical.

12. The system in claim 8 wherein each of said convolvers comprises a down-sizing convolver.

13. The system in claim 12 wherein each of the convolvers further comprises serially connected horizontal and vertical down-sizing convolvers for convolving said incoming image with first and second half-level kernels, respectively; each one of said individual kernels being a derivative of a Gaussian kernel and separable into a corresponding pair of said first and second half-level kernels.

14. The system in claim 8 wherein the comparator comprises a comparison engine for determining the similarity measure, for the test image, as a function of a difference between corresponding elements in said average vector and said test signature, and a corresponding element in said variance vector.

15. The system in claim 14 wherein the comparator comprises:

a subtractor for determining a difference between values in corresponding elements of said average vector and said text signature;

a squarer for squaring a magnitude of the difference so as to yield a squared difference value;

a divider for dividing said squared difference value by a corresponding element in the variance vector to yield an elemental variability value;

a summer for summing the elemental variability values across all of the predefined characteristics and for all of the color axes, to yield a difference measure; and

an inverter for determining the similarity measure for the test image as a reciprocal of the difference measure.

16. A system, responsive to a term chosen by a user, for retrieving, from a stored database, a desired image that is visually similar to images in a set associated with the term, said set containing at least one query image, said system comprising:

a database having stored therein a plurality of images and a signature associated with each of said images, wherein each of the signatures contains a numeric measure of each of a plurality of pre-defined characteristics of an associated image stored in the database so as to form a plurality of numeric measures in the signature for the associated image;

a manager for accessing, in response to a term associated with the pre-defined set of images, from the database an associated set of signatures for a query image set, and for accessing individual ones of the signatures for a plurality of other images in the database so as to define a plurality of test images and a plurality of corresponding test signatures therefor;

a statistics generator, responsive to the set of signatures for the query image set, for providing separate statistical measures, for each of the pre-defined characteristics, that separately represent average and variability measures of said each pre-defined characteristic across all images in the query image set so as to form separate corresponding pluralities of average and variability measures for all of said characteristics;

a comparison engine for determining, for each of the test signatures, a similarity measure reflective of a degree of visual similarity between a corresponding one of the test images and the query image set, wherein the similarity measure is a function, taken across all of the pre-defined characteristics, of separate corresponding ones of the pluralities of the average and variance measures for the query image set and a separate corresponding one of the plurality of said numeric measures for said each test signature; and

wherein the manager, in response to the similarity measure for each of the test images, selects at least one of the test images having a highest relative similarity measure as a retrieved image and provides the retrieved image as output.

17. The system in claim 16 further comprising a path for feeding back the test signature for the retrieved image to the statistics generator such that the statistical measure for each of the pre-defined characteristics is responsive to the query image set and the retrieved image.

18. The system in claim 16 further comprising a signature generator having a plurality of serially connected convolution filters wherein each of said filters comprises a convolver for convolving pixel values, of an incoming image and for a given one of a plurality of color axes in the incoming image, with each one of a plurality of predefined kernels to yield a corresponding filter response image for said color axis.

19. The system in claim 18 wherein the statistics generator produces, for each one of said pre-defined characteristics and for each color axis in the incoming image, a value of an average of the numeric measure of said one pre-defined characteristic across all the signatures in the query image set so as to yield an average vector, and, as said pre-defined statistical measure, a value of variance of the numeric measure of said one pre-defined characteristic across all the signatures in the query image set so as to yield a variance vector.

20. The system in claim 19 further comprising a path for feeding back the test signature for the retrieved image to the statistics generator such that the statistical measure for each of the pre-defined characteristics is responsive to the query image set and the retrieved image.

21. The system in claim 19 wherein the comparison engine determines the similarity measure, for the test image, as a function of a difference between corresponding elements in said average vector and said test signature, and a corresponding element in said variance vector.

22. The system in claim 21 wherein the comparison engine comprises:

a subtractor for determining a difference between values in corresponding elements of said average vector and said text signature;

a squarer for squaring a magnitude of the difference so as to yield a squared difference value;

a divider for dividing said squared difference value by a corresponding element in the variance vector to yield an elemental variability value;

a summer for summing the elemental variability values across all of the predefined characteristics and for all of the color axes, to yield a difference measure; and

an inverter for determining the similarity measure for the test image as a reciprocal of the difference measure.

23. The system in claim 19 wherein the signature generator further comprises first, second and third serially connected convolution filters for iteratively convolving pixel values of said each color axis in the incoming image applied to said signature generator, wherein the first, second and third filters separately convolves the pixel values in the incoming image applied to the first, second and third filters with each one of a plurality of individual kernels in first, second and third sets of predefined kernels, respectively.

24. The system in claim 23 wherein each of the filters further comprises a squarer, responsive to the filter response image, for squaring a magnitude of each pixel value produced by the convolver in said each filter so as to yield a filter response image.

25. The system in claim 24 wherein the signature generator further comprises a sum and normalizer for summing all the pixel values in the filter response image produced by the third filter to yield a resultant sum and for normalizing the resultant sum by a total number of pixels in the filter response image produced by the third filter so as to yield a value of a corresponding one of the plurality of pre-defined image characteristics.

26. The system in claim 25 wherein the first, second and third sets of kernels are identical.

27. The system in claim 23 wherein each of said convolvers comprises a down-sizing convolver.

28. The system in claim 27 wherein each of the convolvers further comprises serially connected horizontal and vertical down-sizing convolvers for convolving said incoming image with first and second half-level kernels, respectively; each one of said individual kernels being a derivative of a Gaussian kernel and separable into a corresponding pair of said first and second half-level kernels.

29. The system in claim 23 wherein the comparison engine determines the similarity measure, for the test image, as a function of a difference between corresponding elements in said average vector and said test signature, and a corresponding element in said variance vector.

30. The system in claim 29 wherein the comparison engine comprises:

a subtractor for determining a difference between values in corresponding elements of said average vector and said text signature;

a squarer for squaring a magnitude of the difference so as to yield a squared difference value;

a divider for dividing said squared difference value by a corresponding element in the variance vector to yield an elemental variability value;

a summer for summing the elemental variability values across all of the predefined characteristics and for all of the color axes, to yield a difference measure; and

an inverter for determining the similarity measure for the test image as a reciprocal of the difference measure.

31. A system, responsive to a term chosen by a user, for retrieving, from a stored database, a desired image that is visually similar to images in a set associated with the term, said set containing at least one query image, said system comprising:

a database having stored therein a plurality of images and a signature associated with each of said images, wherein each of the signatures contains a numeric measure of each of a plurality of pre-defined characteristics of an associated image stored in the database so as to form a plurality of numeric measures in the signature for the associated image;

a manager for accessing, in response to a term associated with the pre-defined set of query images, from the database an associated set of statistical measures descriptive of the query image set, and for accessing individual ones of the signatures for a plurality of other images in the database so as to define a plurality of test images and a plurality of corresponding test signatures therefor, wherein the statistical measures separately represent average and variability measures of a corresponding one of the pre-defined characteristics across all images in the query image set so as to form separate corresponding pluralities of average and variability measures for all of said characteristics;

a comparison engine for determining, for each of the test signatures, a similarity measure reflective of a degree of visual similarity between a corresponding one of the test images and the query image set, wherein the similarity measure is a function, taken across all of the pre-defined characteristics, of separate corresponding ones of the pluralities of the average and variance measures for the query image set and a separate corresponding one of the plurality of said numeric measures for said each test signature; and

wherein the manager, in response to the similarity measure for each of the test images, selects at least one of the test images having a highest relative similarity measure as a retrieved image and provides the retrieved image as output.

32. The system in claim 31 further comprising a signature generator having a plurality of serially connected convolution filters wherein each of said filters comprises a convolver for convolving pixel values, of an incoming image and for a given one of a plurality of color axes in the incoming image, with each one of a plurality of predefined kernels to yield a corresponding filter response image for said color axis.

33. The system in claim 32 further comprising a statistics generator for producing, for each one of said pre-defined characteristics and for each color axis in the incoming image, a value of an average of the numeric measure of said one pre-defined characteristic across all the signatures in the query image set so as to yield an average vector, and, as said pre-defined statistical measure, a value of variance of the numeric measure of said one pre-defined characteristic across all the signatures in the query image set so as to yield a variance vector.

34. The system in claim 33 wherein the comparison engine determines the similarity measure, for the test image, as a function of a difference between corresponding elements in said average vector and said test signature, and a corresponding element in said variance vector.

35. The system in claim 34 wherein the comparison engine comprises:

a subtractor for determining a difference between values in corresponding elements of said average vector and said text signature;

a squarer for squaring a magnitude of the difference so as to yield a squared difference value;

a divider for dividing said squared difference value by a corresponding element in the variance vector to yield an elemental variability value;

a summer for summing the elemental variability values across all of the predefined characteristics and for all of the color axes, to yield a difference measure; and

an inverter for determining the similarity measure for the test image as a reciprocal of the difference measure.

36. The system in claim 33 wherein the signature generator further comprises first, second and third serially connected convolution filters for iteratively convolving pixel values of said each color axis in the incoming image applied to said signature generator, wherein the first, second and third filters separately convolves the pixel values in the incoming image applied to the first, second and third filters with each one of a plurality of individual kernels in first, second and third sets of predefined kernels, respectively.

37. The system in claim 36 wherein each of the filters further comprises a squarer, responsive to the filter response image, for squaring a magnitude of each pixel value produced by the convolver in said each filter so as to yield a filter response image.

38. The system in claim 37 wherein the signature generator further comprises a sum and normalizer for summing all the pixel values in the filter response image produced by the third filter to yield a resultant sum and for normalizing the resultant sum by a total number of pixels in the filter response image produced by the third filter so as to yield a value of a corresponding one of the plurality of pre-defined image characteristics.

39. The system in claim 38 wherein the first, second and third sets of kernels are identical.

40. The system in claim 36 wherein each of said convolvers comprises a down-sizing convolver.

41. The system in claim 40 wherein each of the convolvers further comprises serially connected horizontal and vertical down-sizing convolvers for convolving said incoming image with first and second half-level kernels, respectively; each one of said individual kernels being a derivative of a Gaussian kernel and separable into a corresponding pair of said first and second half-level kernels.

42. The system in claim 36 wherein the comparison engine determines the similarity measure, for the test image, as a function of a difference between corresponding elements in said average vector and said test signature, and a corresponding element in said variance vector.

43. The system in claim 42 wherein the comparison engine comprises:

a subtractor for determining a difference between values in corresponding elements of said average vector and said text signature;

a squarer for squaring a magnitude of the difference so as to yield a squared difference value;

a divider for dividing said squared difference value by a corresponding element in the variance vector to yield an elemental variability value;

a summer for summing the elemental variability values across all of the predefined characteristics and for all of the color axes, to yield a difference measure; and

an inverter for determining the similarity measure for the test image as a reciprocal of the difference measure.

44. A method, responsive to a query image set, for retrieving, from a stored image repository, a desired image that is visually similar to the image set, said set containing at least one query image, the method comprising the steps of:

generating, through three successive convolution filters responsive to an input image, a signature for the input image, the signature containing a numeric measure of each of a plurality of pre-defined characteristics of the input image; wherein the signature generating step comprises the steps of:

generating, in response to the query image set, a corresponding signature for each image in the query image set, and

generating, in response to a test image accessed from the stored image repository, a corresponding test signature;

providing, in response to the signature for each image in the stored image repository, a separate pre-defined statistical measure, for each of the pre-defined characteristics, that represents variability of said each pre-defined characteristic across all images in the query image set;

retrieving successive ones of a plurality of stored images from the stored image repository and applying each one of the plurality of successive images as the test image to the signature generating step so as to generate a plurality of corresponding test signatures;

comparing, across all of the pre-defined characteristics, the statistical measure for each of the pre-defined characteristics against a corresponding numeric measure in each one of the test signatures so as to yield a similarity measure, the similarity measure for said one of the test images being reflective of a degree of visual similarity between the one test image and the query image set;

selecting, in response to the similarity measure for each of the test images, at least one of the test images having a highest relative similarity measure as a retrieved image; and

providing the retrieved image as output.

45. The method in claim 44 wherein the stored image repository is a database stored on a server or across a plurality of servers.

46. The method in claim 45 wherein the stored images that collectively form a motion picture or a video program.

47. The method in claim 45 wherein said image providing step comprises the step of providing, as output, textual information accompanying the retrieved image.

48. The method in claim 45 further comprising the step of feeding back the test signature of the retrieved image to the statistical measure producing step such that the statistical measure for each one of the pre-defined characteristics is responsive to the query image and the retrieved image.

49. The method in claim 44 wherein each of said filtering steps comprises the step of convolving pixel values, of an incoming image and for a given one of a plurality of color axes in the incoming image, with each one of a plurality of predefined kernels to yield a corresponding filter response image for said color axis.

50. The method in claim 49 wherein the statistical measures providing step comprises the step of producing, for each one of said pre-defined characteristics and for each color axis in the incoming image, a value of an average of the numeric measure of said one pre-defined characteristic across all the signatures in the query image set so as to yield an average vector, and, as said pre-defined statistical measure, a value of variance of the numeric measure of said one pre-defined characteristic across all the signatures in the query image set so as to yield a variance vector.

51. The method of claim 50 further comprising the step of feeding back the test signature of the retrieved image to the statistical measure producing step such that the statistical measure for each one of the pre-defined characteristics is responsive to the query image and the retrieved image.

52. The method in claim 50 wherein the comparing step comprises the step of determining the similarity measure, for the test image, as a function of a difference between corresponding elements in said average vector and said test signature, and a corresponding element in said variance vector.

53. The method in claim 52 wherein the comparing step comprises the steps of:

determining a difference between values in corresponding elements of said average vector and said text signature;

squaring a magnitude of the difference so as to yield a squared difference value;

dividing said squared difference value by a corresponding element in the variance vector to yield an elemental variability value;

summing the elemental variability values across all of the predefined characteristics and for all of the color axes, to yield a difference measure; and

determining the similarity measure for the test image as a reciprocal of the difference measure.

54. The method in claim 50 wherein the signature generating step further comprises first, second and third successively occurring convolution filtering steps for iteratively convolving pixel values of said each color axis in the incoming image applied to said signature generating step, wherein each of the first, second and third filtering steps comprises the step of separately convolving the pixel values in the incoming image applied to the first, second and third filtering steps with each one of a plurality of individual kernels in first, second and third sets of predefined kernels, respectively.

55. The method in claim 54 wherein each of the filtering steps further comprises the step of squaring, in response to the filter response image, a magnitude of each pixel value produced by the convolving step in said each filtering step so as to yield a filter response image.

56. The method in claim 55 wherein the signature generating step further comprises the steps of:

summing all the pixel values in the filter response image produced by the third filter to yield a resultant sum; and

normalizing the resultant sum by a total number of pixels in the filter response image produced by the third filter so as to yield a value of a corresponding one of the plurality of pre-defined image characteristics.

57. The method in claim 56 wherein the first, second and third sets of kernels are identical.

58. The method in claim 54 wherein each of said convolving steps comprises a down-sizing convolution.

59. The method in claim 58 wherein each of the convolving steps further comprises the step of separately and successively performing horizontal and vertical down-sizing convolutions of said incoming image with first and second half-level kernels, respectively; each one of said individual kernels a derivative of a Gaussian kernel and being separable into a corresponding pair of said first and second half-level kernels.

60. The method in claim 54 wherein the comparing step comprises the step of determining the similarity measure, for the test image, as a function of a difference between corresponding elements in said average vector and said test signature, and a corresponding element in said variance vector.

61. The method in claim 60 wherein the comparing step further comprises the steps of:

determining a difference between values in corresponding elements of said average vector and said text signature;

squaring a magnitude of the difference so as to yield a squared difference value;

dividing said squared difference value by a corresponding element in the variance vector to yield an elemental variability value;

summing the elemental variability values across all of the predefined characteristics and for all of the color axes, to yield a difference measure; and

determining the similarity measure for the test image as a reciprocal of the difference measure.

62. A computer-readable medium having computer-executable instructions stored thereon for performing the steps of claim 44.

63. A method, responsive to a term chosen by a user, for retrieving, from a stored database, a desired image that is visually similar to images in a set associated with the term, said set containing at least one query image, the method comprising the steps of:

accessing, in response to a term associated with the pre-defined set of images, from a database an associated set of signatures for a query image set, and accessing individual signatures from the database for a plurality of other images in the database so as to define a plurality of test images and a plurality of corresponding test signatures therefor, the database storing a plurality of images and a signature associated with each of said images, wherein each of the signatures contains a numeric measure of each of a plurality of pre-defined characteristics of an associated image stored in the database so as to form a plurality of numeric measures in the signature for the associated image;

providing, in response to the set of signatures for the query image set, separate statistical measures, for each of the pre-defined characteristics, that separately represent average and variability measures of said each pre-defined characteristic across all images in the query image set so as to form separate corresponding pluralities of average and variability measures for all of said characteristics;

determining, for each of the test signatures, a similarity measure reflective of a degree of visual similarity between a corresponding one of the test images and the query image set, wherein the similarity measure is a function, taken across all of the pre-defined characteristics, of separate corresponding ones of the pluralities of the average and variance measures for the query image set and a separate corresponding one of the plurality of said numeric measures for said each test signature;

selecting, in response to the similarity measure for each of the test images, at least one of the test images having a highest relative similarity measure as a retrieved image; and

providing the retrieved image as output.

64. The method in claim 63 wherein the stored image repository is a database stored on a server or across a plurality of servers.

65. The method in claim 63 further comprising the step of feeding back the test signature of the retrieved image to the statistical measure producing step such that the statistical measure for each one of the pre-defined characteristics is responsive to the query image and the retrieved image.

66. The method in claim 63 wherein said image providing step comprises the step of providing, as output, textual information accompanying the retrieved image.

67. The method in claim 63 further comprising a signature generating step having a plurality of convolution filtering steps wherein each of said filtering steps comprises the step of convolving pixel values, of an incoming image and for a given one of a plurality of color axes in the incoming image, with each one of a plurality of predefined kernels to yield a corresponding filter response image for said color axis.

68. The method in claim 67 wherein the statistical measures providing step comprises the step of producing, for each one of said pre-defined characteristics and for each color axis in the incoming image, a value of an average of the numeric measure of said one pre-defined characteristic across all the signatures in the query image set so as to yield an average vector, and, as said pre-defined statistical measure, a value of variance of the numeric measure of said one pre-defined characteristic across all the signatures in the query image set so as to yield a variance vector.

69. The method in claim 68 further comprising the step of feeding back the test signature of the retrieval image to the statistical measure producing step such that the statistical measure for each one of the pre-defined characteristics is responsive to the query image and the retrieved image.

70. The method in claim 68 wherein the determining step comprises the step of ascertaining the similarity measure, for the test image, as a function of a difference between corresponding elements in said average vector and said test signature, and a corresponding element in said variance vector.

71. The method in claim 70 wherein the ascertaining step comprises the steps of:

determining a difference between values in corresponding elements of said average vector and said text signature;

squaring a magnitude of the difference so as to yield a squared difference value;

dividing said squared difference value by a corresponding element in the variance vector to yield an elemental variability value;

summing the elemental variability values across all of the predefined characteristics and for all of the color axes, to yield a difference measure; and

determining the similarity measure for the test image as a reciprocal of the difference measure.

72. The method in claim 68 wherein the signature generating step further comprises first, second and third successively occurring convolution filtering steps for iteratively convolving pixel values of said each color axis in the incoming image applied to said signature generating step, wherein each of the first, second and third filtering steps comprises the step of separately convolving the pixel values in the incoming image applied to the first, second and third filtering steps with each one of a plurality of individual kernels in first, second and third sets of predefined kernels, respectively.

73. The method in claim 72 wherein each of the filtering steps further comprises the step of squaring, in response to the filter response image, a magnitude of each pixel value produced by the convolving step in said each filtering step so as to yield a filter response image.

74. The method in claim 73 wherein the signature generating step further comprises the steps of:

summing all the pixel values in the filter response image produced by the third filter to yield a resultant sum; and

normalizing the resultant sum by a total number of pixels in the filter response image produced by the third filter so as to yield a value of a corresponding one of the plurality of pre-defined image characteristics.

75. The method in claim 74 wherein the first, second and third sets of kernels are identical.

76. The method in claim 72 wherein each of said convolving steps comprises a down-sizing convolution.

77. The method in claim 76 wherein each of the convolving steps further comprises the step of separately and successively performing horizontal and vertical down-sizing convolutions of said incoming image with first and second half-level kernels, respectively; each one of said individual kernels being a derivative of a Gaussian kernel and separable into a corresponding pair of said first and second half-level kernels.

78. The method in claim 72 wherein the determining step comprises the step of ascertaining the similarity measure, for the test image, as a function of a difference between corresponding elements in said average vector and said test signature, and a corresponding element in said variance vector.

79. The method in claim 78 wherein the ascertaining step further comprises the steps of:

determining a difference between values in corresponding elements of said average vector and said text signature;

squaring a magnitude of the difference so as to yield a squared difference value;

dividing said squared difference value by a corresponding element in the variance vector to yield an elemental variability value;

summing the elemental variability values across all of the predefined characteristics and for all of the color axes, to yield a difference measure; and

determining the similarity measure for the test image as a reciprocal of the difference measure.

80. A computer-readable medium having computer-executable instructions stored thereon for performing the steps of claim 63.

81. A method, responsive to a term chosen by a user, for retrieving, from a stored database, a desired image that is visually similar to images in a set associated with the term, said set containing at least one query image, the method comprising the steps of:

accessing, in response to a term associated with the pre-defined set of query images, from a database an associated set of statistical measures descriptive of a set of pre-defined characteristics of the query image set, and accessing individual ones of the signatures from the database for a plurality of other images in the database so as to define a plurality of test images and a plurality of corresponding test signatures therefor, the statistical measures separately representing average and variability measures of a corresponding one of said pre-defined characteristics across all images in the query image set so as to form separate corresponding pluralities of average and variability measures for all of said characteristics, wherein the database stores a plurality of images and a signature associated with each of said images, each of the signatures containing a numeric measure of each one of the pre-defined characteristics for an associated image stored in the database so as to form a plurality of numeric measures in the signature for the associated image;

determining, for each of the test signatures, a similarity measure reflective of a degree of visual similarity between a corresponding one of the test images and the query image set, wherein the similarity measure is a function, taken across all of the pre-defined characteristics, of separate corresponding ones of the pluralities of the average and variance measures for the query image set and a separate corresponding one of the plurality of said numeric measures for said each test signature;

selecting, in response to the similarity measure for each of the test images, at least one of the test images having a highest relative similarity measure as a retrieved image; and

providing the retrieved image as output.

82. The method in claim 81 wherein the stored image repository is a database stored on a server or across a plurality of servers.

83. The method in claim 81 said image providing step comprises the step of providing, as output, textual information accompanying the retrieved image.

84. The method in claim 81 further comprising a signature generating step having a plurality of convolution filtering steps wherein each of said filtering steps comprises the step of convolving pixel values, of an incoming image and for a given one of a plurality of color axes in the incoming image, with each one of a plurality of predefined kernels to yield a corresponding filter response image for said color axis.

85. The method in claim 84 further comprising the step of producing, for each one of said pre-defined characteristics and for each color axis in the incoming image, a value of an average of the numeric measure of said one pre-defined characteristic across all the signatures in the query image set so as to yield an average vector, and, as said pre-defined statistical measure, a value of variance of the numeric measure of said one pre-defined characteristic across all the signatures in the query image set so as to yield a variance vector.

86. The method in claim 85 wherein the determining step comprises the step of ascertaining the similarity measure, for the test image, as a function of a difference between corresponding elements in said average vector and said test signature, and a corresponding element in said variance vector.

87. The method in claim 86 wherein the ascertaining step comprises the steps of:

determining a difference between values in corresponding elements of said average vector and said text signature;

squaring a magnitude of the difference so as to yield a squared difference value;

dividing said squared difference value by a corresponding element in the variance vector to yield an elemental variability value;

summing the elemental variability values across all of the predefined characteristics and for all of the color axes, to yield a difference measure; and

determining the similarity measure for the test image as a reciprocal of the difference measure.

88. The method in claim 85 wherein the signature generating step further comprises first, second and third successively occurring convolution filtering steps for iteratively convolving pixel values of said each color axis in the incoming image applied to said signature generating step, wherein each of the first, second and third filtering steps comprises the step of separately convolving the pixel values in the incoming image applied to the first, second and third filtering steps with each one of a plurality of individual kernels in first, second and third sets of predefined kernels, respectively.

89. The method in claim 88 wherein each of the filtering steps further comprises the step of squaring, in response to the filter response image, a magnitude of each pixel value produced by the convolving step in said each filtering step so as to yield a filter response image.

90. The method in claim 89 wherein the signature generating step further comprises the steps of:

summing all the pixel values in the filter response image produced by the third filter to yield a resultant sum; and

normalizing the resultant sum by a total number of pixels in the filter response image produced by the third filter so as to yield a value of a corresponding one of the plurality of pre-defined image characteristics.

91. The method in claim 90 wherein the first, second and third sets of kernels are identical.

92. The method in claim 88 wherein each of said convolving steps comprises a down-sizing convolution.

93. The method in claim 92 wherein each of the convolving steps further comprises the step of separately and successively performing horizontal and vertical down-sizing convolutions of said incoming image with first and second half-level kernels, respectively; each one of said individual kernels being a derivative of a Gaussian kernel and separable into a corresponding pair of said first and second half-level kernels.

94. The method in claim 88 wherein the determining step comprises the step of ascertaining the similarity measure, for the test image, as a function of a difference between corresponding elements in said average vector and said test signature, and a corresponding element in said variance vector.

95. The method in claim 94 wherein the ascertaining step further comprises the steps of:

determining a difference between values in corresponding elements of said average vector and said text signature;

squaring a magnitude of the difference so as to yield a squared difference value;

dividing said squared difference value by a corresponding element in the variance vector to yield an elemental variability value;

summing the elemental variability values across all of the predefined characteristics and for all of the color axes, to yield a difference measure; and

determining the similarity measure for the test image as a reciprocal of the difference measure.

96. A computer-readable medium having computer-executable instructions stored thereon for performing the steps of claim 81.
Description



BACKGROUND OF THE DISCLOSURE

1. Field of the Invention

The invention relates to a system, specifically apparatus and accompanying methods for use therein, for automatically classifying individual images, on a numerical basis, in, e.g., an image database and, through a query-by-example paradigm, retrieving a desired image(s) therefrom, all without a need for a user to specify desired image features and their individual weights.

2. Description of the Prior Art

Textual representations constitute a relatively small portion of all the currently available information in the world. Nevertheless, dating back many centuries, textual representations, whether handwritten or machine printed, have been and are still used to abstract and convey information from one person to another through a printed (including numeric) alphabet, with language and numbers representing semantic concepts. However, a language by its very nature of representing abstract concepts through an organized contextual hierarchy built on a finite vocabulary, be it words or pictographs (e.g., words, having given meanings--though contextually variable, are grouped by an individual into sentences, sentences into paragraphs, and so forth), often results in communication that is indefinite and ambiguous. An image, unburdened by the limitations inherent in any language, is a far more efficient vehicle to convey information among individuals than text is now and is ever likely to be. Accordingly, humans constantly use their sense of vision in perceiving information and communicating it amongst themselves. As such, the world is awash in and dominated by visual information of one sort or another, whether it is, e.g., a single image such as on a photographic print, printed on a piece of paper or stored in an electronic memory, or a sequence of images such as in a motion picture or in a video signal. Images are so prevalent, as a means of communication, that humans create a huge and escalating number of new images every single day. Recent advances in computing and telecommunications are only increasing the use and dominance of visual information in modern day life.

Many different ways exists to classify and search textual information, particularly with a computer. Any textual item, by its very nature, residing in a database is written in a language, such as English, that, for the most part, has a well-defined and bounded vocabulary. Such a vocabulary readily lends itself to searching for any word (or other similar granular linguistic construct) to locate a stored textual entry of interest. While a textual database itself can be inordinately large, i.e. contain an extraordinarily massive amount of different textual entries, various algorithms exist, which by exploiting a well-defined vocabulary and its usage inherent in a language, such as English, permits a computer to efficiently index and retrieve any of the items stored in the database. In that regard, certain of these algorithms index an item by examining it for the presence of any so-called "keywords". Once any such word is found, a pointer to a stored record of that textual item is added to an appropriate classification (list) defined by (and data structure associated with) that keyword. Each such classification, generally consists of a list of pointers, with each pointer defining a location in a massive textual database at which the corresponding textual record for that item (or a portion thereof, such as an bibliographic abstract) is stored. All the keyword lists collectively define a keyword database. Keyword based retrieval systems generally operate by querying the keyword database with a user supplied keyword(s) to retrieve pointers to all the corresponding records that contain the keyword(s) and then present these pointers (in the form of a numbered list of records) to the user for subsequent selection thereamong. Once the user has selected which records (s)he wants, (s)he can then instruct the system to retrieve, display and/or print the complete item stored within each of the selected records.

Unlike text, an image, from a semantic perspective, is not defined by a linguistic or mathematical vocabulary. In that regard, any such vocabulary is often inadequate to fully describe all but very simple images. As such, human beings, whose communication is biased heavily towards using linguistic based verbal and printed expressions, are ill-equipped to fully describe anything more complex than a very simple image.

Given this linguistic deficiency, then, not surprisingly, computerized information search systems, have yet to, and probably will not for some time, be developed that can semantically categorize an image. Hence, users of existing computerized search systems had little choice but to find a desired image indirectly--by performing a keyword search to locate, e.g., an article that contained not only a textual description of a desired image, such as in a figure caption, but also hopefully (though without any guarantees) the desired image itself. However, this approach often failed and hence frustrated the user because it still relied on a linguistic description--one that was inadequate to fully describe just the particular image which the user wanted and, not surprisingly, often returned an article(s) having a wrong image(s) from that desired.

In view of an increasing predominance of imagery in present day (and, certainly, expected future) communications, a general problem, thus far unmet in the art, has existed, for some time--though recently becoming more acute, as to just how images, by themselves and devoid of accompanying descriptive text, can be efficiently and accurately manipulated, i.e. how such large numbers of images can first be indexed into an image database and a desired image(s) then accurately retrieved therefrom.

Given the absence of a suitable vocabulary to describe image semantics, conventional image classification and retrieval schemes simply relied on a human being to subjectively assess semantic content of images, on an image-by-image basis, for both indexing and retrieval.

In particular, a conventional image repository, commonly referred to as a "photo shop", usually employs an individual to view, become familiar with and appropriately index each and every image in an image collection. Subsequently, a customer, seeking a certain image, will verbally describe various salient aspects of that image in an effort to convey a "gut feeling" of the image to that employee. As an example, suppose a customer desires an image of John Kennedy evoking an appearance of "determination". The employee, using his (her) knowledge of the image collection, will then retrieve an image from the collection thought to be most similar to that which the customer has described. For example, the employee may first examine a group of images which may have been classified as depicting Presidents in office. The employee will present the retrieved image, perhaps an image of a concerned Kennedy sitting at desk in the Oval Office during a meeting involving the Cuban Missile Crisis, to the customer and request whether that image is the one desired or not, and, if not, why, in terms of differences between the desired and retrieved images, the retrieved image is not desired. In this situation, the customer may respond by stating that he (she) wants an image of Kennedy standing in front of an American flag rather than in the Oval Office. Armed with this information, the employee will then return to the collection to narrow the search and retrieve a closer image, if it is available, such as, e.g., Kennedy giving a charged public speech in West Berlin, to that desired. This manual process will iteratively repeat until either all the similar images in the collection, have been retrieved by the employee and rejected or the desired image (or one sufficiently similar) has been found. Alternatively, depending on the granularity of the index, a relatively large group of images may be found, such as all those containing past Presidents, through which a user or customer will be forced to manually examine, in seriatim, each image in the group to locate the desired image. Though human experiences are generally similar across different people, each person who views and indexes images does so based on his (her) own subjective criteria. While these criteria are generally quite broad (such as here perhaps including past Presidents as one, and American flags as another) and to a certain extent overlap among individuals (another individual might use Presidents in public), the decision as to whether an image possesses a given semantic content, hence falling within one of a number of given criteria and then should be retrieved or not based on a verbal description of what is desired, is highly subjective and, for a common image, often varies widely across different viewers. While a single human can index and effectively deal with perhaps as much as 100,000 different images or so, a photo shop often has a collection of considerably more images, such as several 100K images to several million, if not more. With such large collections, indexing is performed by several different individuals; the same occurs for retrieval. Hence, owing to the highly subjective nature of human-based indexing and retrieval, inconsistent results often occur from one individual to the next.

Moreover, no finer granularity than relatively broad criteria (e.g., images of "chairs") is generally used to classify images. Also, in large image collections, images, possessing one criteria (e.g. "chairs"), are generally not cross-indexed, to images possessing another criteria (e.g. depicting "gardens") such that images having both criteria, i.e. a sub-set (e.g. "a chair in a garden") can be readily located by itself and to the exclusion of images having just one of the criteria (e.g. just chairs or just gardens). Furthermore, images are classified with a limited and bounded set of linguistic criteria (e.g. an image of a "chair"). Unfortunately, doing so often results in a customer describing an image using terms (e.g. an image of a "light switch") that have not been used as a class descriptor. Thus, such manual image indexing and retrieval methodologies tend to be highly frustrating and inefficient to use, and quite problematic in their results.

While such manual approaches are still used with relatively small image collections, these approaches become totally useless if one desires to index massive numbers of images provided by, e.g., many currently available image sources. For example, a single video stream, such as programming carried over a broadcast channel, contains a substantial number of images, though successive images possess significant commonality. Currently, hundreds of different cable channels are available each providing a different video stream. To form a comprehensive image database, each and every different image in each video stream might need to be indexed. A similar problem would be posed by indexing images that appear in all recorded footage. Another source of a potentially infinite number of images is the world wide web wherein the number of new visual data sites continues to exponentially grow, with the images provided therefrom exponentially increasing at a significantly greater rate. In any event, a huge number of different images exist both now and increasingly so in the future which are likely to constitute an image database, with far more images than any manual methodology can handle.

Clearly, indexing all such images from all such sources, or even just a small fraction of these images, into a common image database is an enormous task which is only feasible, if at all, if it can be automated in some fashion.

In an effort to overcome the deficiencies inherent in conventional manual indexing and retrieval methodologies, the art has indeed turned to automated, i.e. computerized, techniques. However, in practice, none of these techniques has yet proven entirely satisfactory.

One such technique involves a so-called "query by image content (QBIC)" paradigm. This technique is typified by work currently undertaken in "The QBIC Project" by IBM Corporation (see the web site at http://wwwqbic.almaden.ibm. com), in the so-called "Visual Information Retrieval" technology being developed at Virage Corporation (see the web site at http://www.virage.com); and in the "Photobook" project currently underway at the Media Lab at Massachusetts Institute of Technology (see the web site at http://www-white.media.mit.edu/vismod/demos/photobook). In general, the QBIC technique relies on classifying an image according to a relatively small number of pre-defined fixed image features (also referred to as characteristics or attributes), such as distribution of color across an image, shapes in an image including their position and size, texture in an image, locations of dominant edges of image objects and regions, and so forth. For each image, a computerized system scans the image and measures each such characteristic. The premise behind using such characteristics is to mimic those visual attributes with which humans are familiar and use in recognizing an image. Once these attributes are measured for each image, a sequence of numeric values, i.e. a vector, results for that image. A user desiring to find a given image in a QBIC image database queries the database by providing an example of an image similar to that which he (she) desires and then setting a weight for each such characteristic in a fashion he (she) believes accurately reflects the presence of each attribute in the desired image as compared to that in the test image. For example, if a desired image is to have less variation in color across the image than does the test image, then the user will ostensibly choose a relatively low weight for color distribution, and so forth for other weights. The attributes in the example image are then measured. To retrieve an image, the system compares the vector for the test image, modified by the weights provided by the user, to the vector for each image in the database. A difference measure, based on a mathematical difference, between these two vectors is computed for each database image. The retrieved image with the lowest difference measure is then presented to the user. The user, upon viewing that retrieved image, can adjust the weights to refine the selection process in an attempt to retrieve another image closer to that which he (she) desires, and so forth, until presumably the closest image in the database to that desired is eventually retrieved.

While the QBIC technique represents an advance in machine based image indexing and retrieval, this technique suffers two basic infirmities.

First, the number of attributes is generally limited to between 5-10. This very small number simply fails to provide sufficient resolution to adequately describe the visual characteristics of most images. While, at first blush, it would seem trivial to extend a set of attributes to encompass additional ones, considerable difficulty exists in specifying just what each additional attribute should be. Specifically, as additional characteristics are added, they tend to become increasingly abstract and difficult for a user to comprehend and visualize.

Second, the user is burdened with selecting the proper numeric value for each weight. Not only is the user rather ill-equipped for the task of deciding just what value should be used for each weight for a visually apparent attribute, but as additional increasingly abstract image attributes are used particularly those which the user can not readily visualize and comprehend, the difficulty inherent in this task greatly compounds.

As a result of these practical limitations, the number of image attributes in a QBIC system remains small at approximately 5-10 rather broad characteristics. Consequently, a fairly large group of images is usually produced in response to any query to a QBIC system, necessitating that the user manually review each and every resulting image. Doing so is often quite labor and time intensive and, as such, generally infeasible for practical use.

Apart from a significant effort potentially required of a user during a search, the user generally needs to expend a considerable amount of time and effort just to properly learn how to use a QBIC system, including how to correctly set the weights. Inasmuch as any user will still set the weights subjectively, then, if different users--even those who are highly trained--were to search for a common image in a common image database, the subjectivity exhibited by these users will likely yield different and often inconsistent images. These factors further reduce the attractiveness of using a QBIC system.

Another conventional technique for automating image classification and retrieval uses, e.g., so-called "eigenimages" which are mathematical techniques for clustering vectors in space. An additional technique known in the art measures a distribution of colors, in terms of a histogrammed frequency of occurrence across a query image and for an entire color gamut. This histogrammed distribution is also measured for each image in an image database, with a distance measure then used to compare the histogrammed results between each database image and the query image. The database image possessing the smallest distance measure is presented to the user as a retrieved image. Each of these two alternate techniques suffers the same infirmity inherent in a QBIC system; namely, for a user, it is both labor and time intensive. Specifically, both of these systems exhibit insufficient resolution, which, in turn, often yields a large group of retrieved images that a user must individually review in seriatim.

Thus, an acute need still remains in the art for an effective automated image classification and retrieval system. Such a system should not be either time or labor intensive for an individual to learn, much less use. In that regard, the system should classify images with sufficient resolution so as to reduce the number of retrieved images presented to a user at any one time. The system should also not require a user to specify any desired image features or set any numeric weights thereof. Additionally, the system should substantially eliminate user induced subjectivity and produce highly consistent results, in terms of the images retrieved, across different users.

SUMMARY OF THE INVENTION

The present invention satisfies this need and, in so doing, advantageously eliminates the deficiencies inherent in conventional image classification and retrieval techniques.

In accordance with the present invention, which utilizes a query by example paradigm, a user supplies a group of so-called query images in which each image is structurally similar to those additional images (s)he desires. Then, the invention, through a mathematical assessment of visual similarity using a substantial number of different pre-defined visual characteristics of the query images, taken collectively, against each image in an image database and, using an inverse variance based measure, returns, as its results, the image(s) in the database that is most similar to the query images.

Specifically, through my invention, a signature is generated for each query image. Each such image can reside on a database or be externally supplied. For each such image, its signature is generated by iterative multi-level convolution filtering of pixel values for each color axis of that image. At each level, incoming pixel values are applied to a convolution filter and are separately convolved with each one of a set of, e.g., predefined derivatives of Gaussian kernels, with the resultant pixel values then being squared to yield an output image. The output image is either supplied as input to the next filtering level or its pixel values are summed. This sum becomes the value of a corresponding element in the signature. For illustratively three filtering levels with, e.g., 25 kernels applied at each level, values of 25.sup.3 (or 15,625) different visual characteristics result for each color axis in the query image. Once a signature has been generated for each query image, statistic vectors, illustratively having average and variance measures, are separately computed across corresponding elements and for each color axis, in all the query image signatures. These vectors mathematically describe the query images taken collectively.

Once these vectors are computed, then the signature of each image, i.e. as a so-called "test" image, having a signature referred to as a "test" signature, in the database is compared against the average and variance vectors. During comparison operations, the average and variance vectors remain fixed. A "similarity" measure, being a numeric measure reflective of the relative similarity between the test image and between all the query images viewed collectively, is computed as typically a function of a sum, across all the vector elements and for all the color axes, of the difference between the corresponding elements in the average and test signature vectors divided by the corresponding element in the variance vector.

Since the image characteristics are all pre-defined and fixed, then, the signature of each test image may be precomputed, i.e. computed and stored for use during a subsequent image retrieval operation--which is likely to save significant processing time during a retrieval operation, or may be computed as needed for immediate use for comparison purposes. Based upon user specification, a certain number of images, such as five or ten, that have the largest similarity measures are then rank-ordered, in terms of descending similarity, and presented for display to the user for either acceptance or rejection.

My invention advantageously eliminates a need for the user to have any prior knowledge of any visual characteristic used to classify an image and hence renders the search and retrieval operations essentially independent of the user. Similarity among query images and between query and test images is determined substantially, if not totally, independent of the user. As such, the present invention advantageously eliminates any need for the user to weight any image characteristic vis-a-vis other such characteristics. Not only does this greatly relieve the burden on the user but also it substantially, if not totally, reduces any user induced subjectivity in retrieval operations, thus returning consistent results across an entire user community.

As a feature, the present invention allows a user to refine the retrieval process to locate images of increased relevance to the query images. Specifically, any retrieved image can be applied as an additional query image. In this case, the average and variance vectors will be updated to account for the signature for this new query image, and the entire retrieval process repeated.

Furthermore, as another feature of the invention, once a signature has been calculated for an image, whether it is a query or test image, that signature can be stored for further use thereby obviating the need to re-calculate the signature of that image during a retrieval operation, and hence substantially reduce attendant processing time for retrieval.

Moreover, query signatures can be grouped together and indexed with a common linguistic (or other) term that is descriptive, e.g. semantically, of the entire group of images. This allows a textual, rather than just mathematical, vocabulary to be built to define a group of images and permit easy retrieval, through a textual database search, of the signatures of this group for use during a subsequent retrieval.

As a further feature of the invention, not only can the signature of each image residing in an image database be stored for subsequent use, but also so too can the average and variance vectors for any group of images. These vectors can be precomputed for a given group of images, stored within a database, and then, at some time later accessed and directly compared against each signature in the database to yield an associated similarity measure. As with storage of precomputed signatures, storage and utilization of precomputed average and variance vectors advantageously eliminates any need to directly process images during a retrieval operation. This further reduces processing time over merely storing complete signatures inasmuch as average and variance statistics for a group of signatures need not be calculated during a retrieval operation.

As such, a user can apply a class of related images of a unitary semantic concept to the invention, provide, e.g., a linguistic description, by way of one or more terms or keywords, and have the images stored, and the signatures and average and variance vectors computed therefor and, along with the accompanying description, stored for this image class (group) for future use. These vectors form a mathematical based semantic description of this image class. In addition, the class can be readily supplemented later with an additional image(s). In this case, the average and variance vectors would be updated to account for the additional image. As additional images are added to any class and the vectors updated accordingly, these vectors advantageously become increasingly robust and discriminating descriptors of that class.

BRIEF DESCRIPTION OF THE DRAWINGS

The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:

FIG. 1 depicts illustrative characteristic vectors and variability therebetween for a series of corresponding images;

FIG. 2 depicts a high-level block diagram of system 200 that implements the present invention;

FIG. 3 depicts a high-level block diagram of application programs 229 that execute within system 200;

FIG. 4 depicts a block diagram of one embodiment of inventive image retrieval system 340 that forms part of application programs 229 shown in FIG. 3;

FIG. 5 depicts the organization and constituent fields of illustrative signature vector 500;

FIG. 6 depicts, in pictorial form, compound and half-level kernels, the latter used in conjunction with iterative convolution filters 710 shown in FIG. 7;

FIG. 7 depicts a block diagram of signature computation engine 410 shown in FIG. 4;

FIG. 8 depicts the proper alignment of the drawing sheets for FIGS. 8A and 8B;

FIGS. 8A and 8B collectively depict a flowchart of signature computation process 800 which is performed by computation engine 410;

FIG. 9 depicts a block diagram of comparison engine 450 shown in FIG. 4;

FIGS. 10A and 10B, 11A and 11B, and 12A and 12B pictorially depict first, second and third pairs, respectively, of a group of exemplary query images into a pre-defined image database, and a corresponding group of resulting search images as retrieved from that database and as produced through use of the present invention, respectively; and

FIG. 13 depicts a block diagram of another embodiment, here system 1300, of the inventive image retrieval system that forms part of application programs 229 shown in FIG. 3.

To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.

DETAILED DESCRIPTION

After considering the following description, those skilled in the art will clearly realize that the teachings of my invention can be readily utilized in substantially any image retrieval system to locate a desired image(s), regardless of the semantic content of the images sought or searched. For example, the invention can find use in automating search and retrieval aspects of a so-called "photo shop" where one image or a group of images that share a common semantic theme is retrieved from a generalized image database that stores a multitude of unrelated digitized images, and/or of a specialized image library, such as in retrieving a particular image or group thereof from a collection of highly related images, such as digitized medical X-ray or CAT scan images. To simplify and facilitate reader understanding. I will describe my invention in the context of use in conjunction with a "photo shop" type image retrieval application.

In essence, my invention utilizes a query by example paradigm. With this paradigm, a user supplies a group of so-called query images in which each image is visually similar to those additional images (s)he desires. Then, my invention, through a mathematical assessment of visual similarity using a substantial number of different pre-defined visual characteristics of the query images, taken collectively, against each image in an image database and using an inverse variance based measure, returns, as its results and without human intervention such as through, e.g., setting weights and the like, the image(s) in the database that is most similar to the query images. Advantageously, my invention renders the search and retrieval operations totally independent of the user, thus advantageously eliminating a need for the user to have any prior knowledge of any characteristic used to classify an image, let alone weight that characteristic vis-a-vis other such characteristics.

In accordance with my inventive teachings, to assess visual similarity between a group of query images and each image in the image database, it is first necessary to quantitatively measure the presence of each visual characteristic (feature) in each query image and determine just those particular visual characteristics that accurately and collectively describe the entire group of query images. These particular characteristics, both in terms of those which exist in these images and their quantitative measures relative to each other (i.e. an implicit ranking or weighting), describe the query images taken as a whole. Once this subset has been so defined, then a resulting image(s) that is ultimately retrieved from the database and presented to the user is that which possesses visual characteristics that are the most similar, in terms of the characteristics that are present and their values, to those in the group of query images, and hence should be a member of the same image subset that contains the query images.

Consider FIG. 1 which depicts illustrative characteristic vectors and variability therebetween for a group of corresponding query images. Here, illustrative query image group 10 contains four query images 12, 14, 16 and 18 (also labeled as images I, II, III and IV, respectively)--the actual visual content of each of these images is immaterial. Each image in the group can be classified according to a plurality of pre-defined fixed image characteristics, here illustratively characteristics a, b, c, d and e (not explicitly shown). Ignoring, for the moment, just what exact visual attribute is delineated by each characteristic, let us assume that these five such characteristics are used to classify each and every image and each such characteristic specifies a different corresponding visual attribute of an image. For every image, the measurement of each of its five characteristics (a, b, c, d and e) forms a characteristic vector for that image, with the measured value of each characteristic placed in a corresponding element of the vector. In that regard, characteristic vectors 22, 24, 26 and 28 are associated with images 12, 14, 16 and 18, respectively, and contain respective elements 22a, 22b, 22c, 22d and 22e; 24a, 24b, 24c, 24d and 24e; 26a, 26b, 26c, 26d and 26e; and 28a, 28b, 28c, 28d and 28e.

As depicted by corresponding arrows, of which arrow 34 is illustrative, image 12 contains two characteristics (a and e), having values residing in elements 22a and 22e, that are relatively low in value. Image 14 contains relatively high and low values of characteristic a and e, residing in elements 24a and 24e, respectively, of vector 24. Image 16 contains relatively moderate and low values of characteristics a and e, respectively, residing in elements 26a and 26e of vector 26. Lastly, image 18 contains relatively high and low values of characteristics a and e, respectively, residing in elements 28a and 28e of vector 28.

Those characteristics which exhibit relatively low variability across all the query images, such as characteristic e--which is relatively low in each of the query images, are accurate descriptors of the group of query images. In sharp contrast, characteristics, such as characteristic a, which have a relatively high variability across all the query images are a rather poor descriptor of that group. Consequently, each of those images in the image database that possess low values of characteristic e are more likely to be similar to the query images and hence is a likely candidate for inclusion in the same image subset that contains the group of query images. Through my inventive teachings, a variance based measure, as described in detail below, is used to assess whether an image is sufficiently similar to the group of query images as to be included in the subset thereof and so returned to the user.

In order to provide sufficient resolution to classify images with relatively fine content and yet accurately discriminate among individual images in a massive image database, my inventive image classification and retrieval technique utilizes a substantial number of different visual characteristics. As contrasted with conventional image retrieval techniques, such as QBIC, that use only 5-10 characteristics which mimic those visual attributes with which humans use in recognizing an image, but which also provide grossly inadequate resolution, my inventive technique is not so constrained. In sharp contrast, and as described below, my inventive technique uses illustratively 25.sup.3 different characteristics (which equals 15,625) (as discussed in considerable detail below) for each color axis (e.g. additive three-color space: red, green and blue) in the image, for a total of, e.g., 3.times.25.sup.3 or 46,875 such characteristics for each image. If color spaces with more than three axes (such as, e.g., subtractive four-color space: cyan, magenta, yellow, black), then the number of characteristics linearly varies accordingly. By isolating a user from a need to have any knowledge of, much less weight, or even interact with any of these characteristics, my invention utilizes levels of increasingly abstract mathematically-based characteristics that have no direct mapping to any verbally describable image feature, but provide substantially enhanced resolution over that heretofore attained in the art.

Before proceeding to an in-depth discussion of those image characteristics used in my invention--which are discussed below in conjunction with FIGS. 6, 7 and 8A-8B, to facilitate reader understanding. I will now address high-level aspects of my invention, in terms of its overall implementation and in the underlying software.

FIG. 2 depicts a high-level block diagram of system 200 that implements the present invention. As shown, this system contains computer system 220 which itself comprises input interface (I/F) 222, processor 224, memory 226 and output interface 230, all conventionally interconnected by bus 240. Memory 226, which generally includes different modalities (all of which are not specifically shown for simplicity) illustratively random access memory (RAM) and hard disk storage, stores operating system (O/S) 228 and application programs 229. O/S 228 may illustratively constitute the WINDOWS NT operating system presently available from the Microsoft Corporation of Redmond, Washington (WINDOWS NT is a registered trademark of the Microsoft Corporation). The specific software modules that implement the inventive image classification and retrieval system reside within application programs 229.

Incoming images are provided to system 200, via input path 210, through a wide variety of sources--all of which are collectively referenced as sources 205. These sources, of which only a few are specifically shown, illustratively include: images provided through an Internet connection, from a broadcast receiver (such as a television or satellite receiver), from a video cassette recorder (VCR) or a digital camera, from a live video feed(s), from an external database(s), and/or from a document scanner. Inasmuch as my invention will function with images regardless of their source, the particular modality through which any such image is physically provided to system 200 is immaterial. Input interface 222 is connected to path 210 and contains appropriate circuitry to provide the necessary and corresponding connection, e.g. modem or network interfaces for Internet access, required to physically connect and interface each differing source of incoming digitized images to computer system 220. In addition, input interface 222 connects and interfaces user input device 270, such as a keyboard and mouse, to computer system 220. Display 250, such a conventional color monitor, and image printer 260, such as a conventional color laser printer or other well-known color printer, are connected, via leads 232 and 234, respectively, to output interface 230. Output interface 230 provides requisite circuitry to electrically connect and interface display 250 and printer 260 to computer system 220.

In operation, once an incoming image (from whatever source) is applied to system 220 and stored therein as part of an image database residing within memory 226 and typically within hard disk storage therein (though not specifically shown), that image is eventually processed by processor 224 under control of application programs 229 and specifically those software modules, as discussed below, that implement the inventive image classification and retrieval system. O/S 228 can be configured to continuously execute these application programs such that system 220 will automatically accept and classify new images as they are applied to system 220 from any one of sources 205.

A user can also invoke the present invention, through appropriate commands entered via device 270 and in response to a program icon appearing on display 250, to either classify a new image which the user will provide to system 200 and/or retrieve a desired image(s) from the image database.

Should the user invoke an image retrieval function, then that user will supply a set of query images, which can be images that are either already stored within the image database or are not, from any image input source, via path 210, to system 220. That system will then, in an inventive manner as discussed in considerable detail below, retrieve an image(s) from all those stored in the image database and which is the most similar to the query images, taken collectively, and display that image on display 250. The user, through user input device 270, can either select that particular image for saving via, e.g., printing through printer 260, or some other mechanism that provides persistent storage, or reject that image in favor of retrieving another similar image, and so forth.

Inasmuch as the specific hardware components of computer system 220 as well as all aspects of the software stored within memory 226, apart from those modules that implement the present invention, are conventional and well-known, they will not be discussed in any further detail.

FIG. 3 depicts a high-level block diagram of application programs 229 that execute within system 200. As shown and in pertinent portion, these programs include Internet connectivity programs 320 and inventive image retrieval system 340. Programs 320, which are conventional, include web crawler 322 and web browser 326. If, as noted above, the operating system is configured such that the inventive retrieval and classification system automatically classify incoming images, as they arrive, web crawler 322 can be used to search, on an automated basis, the Internet, on a site-by-site basis progressing through the net, to download available images and accompanying text. Browser 326 lo can be used by a user stationed at computer system 220 (see FIG. 2) to manually search desired Internet sites and download images and textual material the user has found of interest. Images obtained through programs 320 are routed, via paths 330, along with images provided from other sources, to inventive image retrieval system 340. This system applies retrieved digitized images to output path 350 and accepts selections, and other user based input, via path 360. Paths 310, 330, 350 and 360 are all software-defined, as contrasted with leads 232 and 234 shown in FIG. 2 which are physical connections. Inasmuch as all the intervening software and/or hardware components for interfacing these software-defined paths to the actual hardware shown in FIG. 2 are readily apparent to those skilled in the art and do not form part of the present invention, then, for simplicity, these components have all been omitted and will not be addressed hereinafter.

FIG. 4 depicts a block diagram of one embodiment of inventive image retrieval system 340 that forms part of application programs 229 shown in FIG. 3. FIG. 5 depicts the organization and constituent fields of an illustrative signature vector. The reader should simultaneously refer to both FIGS. 4 and 5 throughout the following discussion.

To fully elucidate the present invention, it is worthwhile to first provide a general overview of my inventive image processing methodology before delving into the specifics shown in this figure.

In particular and as alluded to above, each query image is first classified. During the course of doing so, a signature vector (for brevity hereinafter referred to as just a "signature"), such as signature 500, is fabricated for that image. Each axis in the color space for each image is classified by illustratively 25.sup.3 different characteristics. As such, for the illustrative red, green, blue (RGB) color space, this vector has three sub-vectors 510, 520 and 530 for red, green and blue respectively, each of the sub vectors containing a field for a value of each such characteristic for that color axis. For simplicity, the RGB color space will be assumed hereinafter. Accordingly, each sub-vector contains 25.sup.3 (equaling 15,625) fields. Illustratively, sub-vector 510 contains fields 510.sub.1, 510.sub.2, . . . , 510.sub.y ; sub-vector 520 contains fields 520.sub.1, 520.sub.2, . . . , 520.sub.y ; and sub-vector 530 contains fields 530.sub.1, 530.sub.2, . . . , 530.sub.y (where y equals 15,625). A complete signature for each different RGB image contains three constituent sub-vectors for a total of 3.times.25.sup.3 (or 46,875) fields.

Once a signature has been fabricated for all the query images, statistics are computed across all these signatures. The resulting statistics, which collectively describe all the query images, constitute an average signature vector and a variance vector (for each color axis in the color space of the input image). Each individual element in the average and variance vectors are the average and variance, respectively, of the same corresponding elements (i.e. those which have same index) across the signatures for all the different query images. Once the average and variance vectors are computed, then the signature of each image, i.e. as a so-called "test" image having a signature referred to as a "test" signature, in the image database is compared against the average and variance vectors. During comparison operations, the average and variance vectors remain fixed. In that regard, a "similarity" measure is computed as typically a function of a sum, across all the vector elements and for all the color axes (e.g., R, G and B), of the difference between the corresponding elements in the average and test signature vectors divided by the corresponding element in the variance vector. Inasmuch as the image characteristics which I use are all pre-defined and fixed, then, the signature of each test image may be precomputed, i.e. computed and stored for use during a subsequent image retrieval operation--which is likely to save significant processing time during a retrieval operation, or may be computed as needed for immediate use for comparison purposes. Based upon user specification, a certain number of images, such as five or ten, that have the largest similarity measures are then rank-ordered, in terms of descending similarity, and presented for display to the user for either acceptance or rejection.

In sharp contrast with conventional image retrieval systems, my inventive system relies on automatically computing the similarity measure without any user intervention, such as in setting weights or the like. This, in turn, advantageously permits my inventive system to utilize an extremely large and diverse set of, e.g., increasingly abstract, image characteristics which are far in excess of that heretofore possible and thereby achieve excellent search resolution through massive number of image files.

With the above in mind and returning to FIG. 4, inventive system 340 contains image source selection block 405, signature computation engine 410, statistics computation block 430, image database system 440, comparison engine 450, image retrieval and display manager 460, image information database system 470 and graphical user interface (GUI) 480.

Incoming images from various sources are routed, via paths 330, to image source selection block 405. This block, based upon prior user input, selects one of the image sources. To simplify matters, let us assume for the moment that the user is providing a succession of query images, in seriatim (i.e. a "set" or group of such images), to the system. Each query image generated by the selected source is then routed through block 405, via path 408, to signature computation engine 410. This engine computes, through an iterative multi-level convolution based filtering operation (as discussed in detail below), the complete signature for that query image. Once all the signatures for all the query images have been fabricated, as symbolized by lines 420, statistics computation block 430 computes the average and variance vectors for all the query image signatures. As noted above, the average and variance vectors are simply formed of the average and variance on an element-by element basis across all corresponding elements of all the query image signatures, with separate average and variance sub-vectors for each color axis. Although only one sub-vector is specifically shown for each of the average and variance vectors, each of these vectors contains three sub-vectors, one for each color axis, in the RGB color space, in the same manner as does an image signature. Hence, the average and variance vectors each illustratively contains 3.times.25.sup.3 (or 46,875) elements.

Once the average and variance vectors are computed for all the query images in the set, these vectors are routed via paths 432, 436 and 438, to one input of comparison engine 450. At this point, image retrieval and display manager 460 instructs, via path 467, image database system 440, and specifically database manager 442 therein, to access a first image stored within image database 446. Database manager 442 can be any conventional database manager designed to manipulate a database of stored images. Illustratively, these stored images can be individual frames that collectively form a motion picture or other video program. In any event, once this first image is accessed, this image is routed as a "test" image, by database manager 442, via path 441, to signature computation engine 410. Signature computation engine 410 computes a complete signature for this test image. This signature is computed in the exact same manner as accomplished for each query image. Once the first test image signature has been computed, it is routed, via path 425, to a second input of comparison engine 450 and also, via manager 460, for storage within database 476. In addition, image database 446 also stores a pointer to each full image stored therein for use in subsequently accessing and/or identifying that image. Accordingly, a pointer to the accessed first test image is routed by database manager 442 to image retrieval and display manager 460 for temporary storage within database 476.

Comparison engine 450 compares the first test image signature against the average and variance vectors and produces a similarity measure. The higher the numerical value of this measure is, then the closer the test image, in terms of the presence and prevalence in the test image of the pre-defined characteristics in the query images weighted by the query-variance, is to the query images. This measure is applied, via path 455, to image retrieval and display manager 460 for local storage. Manager 460 forms an association between the similarity measure then incoming from path 455 and the pointer, provided by database manager 442, to the first test image, and then stores both of the items temporarily as a common entry within a local file (not specifically shown). Once the entry for the first test image is stored, manager 460 then instructs database manager 442, via path 467, to access the next successive image stored within image database 446 as a second test image. The signature for this second test image and its resulting similarity measure are then successively calculated and temporarily stored, within database 476, along with the corresponding pointer to this particular test image, and so forth for every other image within database 446. Manager 460 locally stores the entries ranked in terms of descending similarity measures within image information database system 470. This database system is managed by database manager 472.

After all the images stored within image database 446 have been processed as "test" images with corresponding signatures and similarity measures generated therefor and stored within database 476, image retrieval and display manager accesses, via manager 472 and from database 476, the pointers for a group of images that have the highest similarity measures and are to be displayed as "retrieved" images. The number of images in this group is predefined, such as illustratively five or ten images, with this number either being a default value or over-ridden by user specification. The retrieved images are either presented to the user sequentially, in terms of decreasing similarity measure, for manual acceptance or rejection by the user. To accomplish this, manager 460 queries, via manager 472, database 476 for the entries for the group of images with the highest similarity measures. The resulting pointers to each image in the group returned by database manager 472 is then passed, via manager 460, back, via path 467, to image database system 440. There, database manager 442 accesses each of these particular images and routes it, in its entirety, to manager 460. The manager then, through GUI 480, displays this image on the display monitor and solicits input from the user. The user, through his(her) selection(s) applied to system 340 can either accept this image, i.e. signify that the retrieval is complete, or have the next retrieved image in succession, i.e. that having the next highest similarity measure, accessed and displayed. Alternatively, the user can instruct manager 460 to simultaneously display all or some of the retrieved images in a non-overlapping fashion on the monitor. In this instance, manager 460 will successively access each such retrieved image from image database system 440, and, once all such images have been obtained, then, assemble and paint, through GUI 480, an appropriate image display, containing all these images, on the monitor.

To facilitate understanding, image database 446 and image information database 476 are shown as separate databases. Clearly, these databases can be combined into one common database. Furthermore, image database 446, while shown as part of image retrieval system 340, can be sited on a server, such as a single Internet server, or even distributed over a plurality of Internet and/or other servers, all or most of which are remote from system 340. Furthermore, the pointer for each image can be a so-called "URL" (uniform resource locator) to an Internet location where that image is stored.

To refine the retrieval process, the user can also instruct the image retrieval and display manager to use any retrieved image as a additional query image in order to refine a subsequent retrieval operation for that query. In this case, the average and variance vectors will be updated to account for the signature for this new query image, and the entire retrieval process repeated.

In particular, this additional image, particularly its signature, can be added to an existing group of query images--as symbolized by, e.g., dashed line 469 in FIG. 4. Once this occurs, the average and variance vectors would be recomputed and updated to take into account all the query images, including the additional image. The resulting updated average and variance vectors would then be used during subsequent retrieval. In this manner, as the system retrieves images of increasing interest and relevance to a user, that user can iteratively and increasingly refine the query to locate any such images that are likely to be of even further interest. In a similar manner, rather than just adding an image(s) (or its signature(s)) to a group of query images, the user, though suitable instructions provided to manager 460, can edit the gas often as once or iteratively as often as desired, by deleting a image from the group and/or substituting a newly retrieved image for one of lesser interest in the group, and repeating the retrieval process. All such edits will further refine the group of query images and hence the ensuing retrieval operation based thereon. Through such interactivity, the user can very effectively guide the inventive system in locating and retrieving an image(s), from those stored within database 446, that, semantically speaking, is increasingly closer to that desired by the user and in less time than might otherwise occur, while avoiding the retrieval of images that are only tangentially relevant.

To significantly increase retrieval speed, image signatures can be pre-computed for each image within image database 446 and stored, along with its associated image pointer, within database 476. Under this scenario, each stored signature rather than an entire image will be accessed, with the accessed signature applied, as symbolized by dashed path 468, directly to comparison engine 450 for comparison with the average and variance vectors. Inasmuch as a signature comparison, as discussed below, consumes substantially fewer calculations than those required to generate a complete signature, significant reductions in retrieval time will result through signature pre-computing and storage. As an aside, the similarity between any two different images can be readily determined by comparing their two signatures and determining a value, such as an inverse of summed absolute differences between corresponding elements in and across both signatures, representative of a degree of similarity therebetween. As the value increases for any pair of images relative to that for other image pairs, so does the similarity of the former over the latter.

Prior to proceeding with a detailed discussion of signature computation engine 410, to simplify understanding, I will digress somewhat and address convolution, including compound and particularly half-level convolution kernels, as used in the present invention.

A given visual characteristic in an image can be measured by convolving that image with a corresponding convolution kernel, in which that kernel has been specifically defined to detect that characteristic. For example, by convolving an input image by a kernel, typically in pictorial form (-.sym.), designed to detect edges, a resulting response image will predominantly contain the edges depicted in the original image but not areas of constant tone. Different characteristics, such as the presence of horizontal lines, vertical lines, diagonal lines and others, can be detected and hence different response images will result from separately convolving the input image with different predefined kernels appropriate for detecting each one of these different characteristics. By convolving an image with certain kernels, substantially all the frequency information, particularly the low frequency information, will be preserved in the response image. This frequency preservation also occurs with a one-half down-sizing convolution, e.g. where every other, rather than every, neighborhood in an input image is convolved. Through such downsizing, the convolved response image will be one-fourth the size of the input image (reduced by one-half in both the vertical and horizontal directions) but will contain substantially all the frequency information, and particularly the low frequency information, carried by the input image and the kernel.

With this in mind, I have recognized that since convolution with certain kernels preserves substantially all the frequency information of a specific type associated with the nature of the kernel residing in an input image in a response image, then additional image characteristics in the input image can be detected by iteratively convolving that image, each time with a set of kernels. In that regard, an input image can be convolved with a first set of kernels to produce a first level response image which itself, in turn, can be convolved with a second set of kernels, and so forth, to produce an exponentially larger number of different, increasingly abstract, image characteristics. For example, if a set of illustratively 25 unique kernels is used at each level, then with three iterative levels of convolution, 25.sup.3 (15,625) different image characteristics will occur. Though a different set of kernels can be used at each level, for simplicity, the same set is preferably used at each level. A three-level iterative convolution performed over three different color axes in the input image (here R, G and B) will yield a total of 3.times.25.sup.3 (46,875) different image characteristics. Such a massive number of different visual characteristics can advantageously provide sufficient resolution to accurately search for a single desired image across a massive number of different images. I have found that increasing the number of iterations past three yields relatively little useful information content at the expense of a substantial increase in a number of required computations. In addition, the number of kernels used at each level is not critical provided a large enough number (K) is used such that based on the number (L) of iterative convolution levels used, a sufficiently large number (K.sup.L) of different image characteristics results for each different color axis to provide satisfactory resolution for retrieval.

Furthermore, in accordance with my invention, the convolution kernels used at each level are derivatives of so-called Gaussian kernels and specifically two-dimensional derivatives of Gaussian kernels. In general and as known in the art, a derivative of a Gaussian function has excellent information carrying properties. It is well known that such a function can be well modeled by small kernels, such as matrices illustratively 3.times.3, or 5.times.5 in size. See, e.g., W. T. Freeman et al, "The Design and Use of Steerable Filters", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, No. 9, September 1991, p. 891-906. As long as a complete set of such kernels is used, then a convolved response image can accurately and efficiently reproduce substantially all the information content in an input image. As per my inventive teachings, to fabricate a set of illustratively 25 two-dimensional kernels, two s