You are on page 1of 12

Jacob Chandran

IM 6, English 12 AP
5/15/17
Antonio Torralba, Kevin P. Murphy, William T. Freeman, Sharing visual features for
multiclass and multiview object detection IEEE TRANSACTIONS ON
PATTERN ANALYSIS AND MACHINE INTELLIGENCE, Volume: 29, p.
854-869, May 2007

The problem of detecting a generic category of object in cluster is addressed in


this paper. Traditional approaches require applying different classifiers to the image, at
multiple locations and scales. This can be slow and can require a lot of training data,
since each classifier requires the computation of many different image features. In
particular, for independently trained detectors, the (runtime) computational complexity,
and the (training-time) sample complexity, scales linearly with the number of classes to
be detected. In this research a multi-task learning procedure, based on boosted decision
stumps, that reduces the computational and sample complexity, by finding common
features that can be shared across the classes (and/or views) is presented. The detectors
for each class are trained jointly, rather than independently. For a given performance
level, the total number of features required, and therefore the run-time cost of the
classifier, is observed to scale approximately logarithmically with the number of classes.
The features selected by joint training are generic edge-like features, whereas the features
chosen by training each class separately tend to be more object-specific. The research
addresses common issues related to zero shot learning such as object detection, interclass
transfer of knowledge, sharing feature sets and multiclass classification

A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing Objects by their


Attributes, Proc. IEEE Conf. Computer Vision and Pattern Recognition
(CVPR), 2009.

The human brain allows us to put different objects into categories on the spot
regardless of whether we have seen the object before or not. Even if we see a new object,
the brain still receives information on the object using words to describe it, for example
yellow, spotted, and small. Computers do not have this capability and if they do not
recognize a certain object, the computers will not obtain any information about the
object. In order to program a computer to recognize certain objects, thousands of images
must be passed through until the computer can recognize the object based on certain
attributes. Now, computer scientists are trying to get computers to recognize semantic
qualities of an object like yellow, spotted, or small for both known and unknown objects.
This is difficult because there are many kinds of spots if the computer did not recognize
the kind of spot, it would not return any data, as opposed to a human which would still be
able to recognize the spot, even if they had never seen the same spot before. This
example represents the main dilemma for visual recognition which is the question of how
to get a computer to analyze an object that it had never encountered before. This paper
has helped me understand the basic abilities of visual technology up until this point in
time, and the main problem that computer scientists are trying to solve in order to make
visual technology more like that of the human brain.
Jacob Chandran
IM 6, English 12 AP
5/15/17

Bishop, Christopher M. Pattern Recognition and Machine Learning. Chapter 7. Sparse


Kernel Machines, pages 325-339, Published by Springer, 2007.

The Support Vector Machines (SVM) is a very popular method to solve problems
in classification and novelty detection. An important property of support vector machines
is that determination of the model parameters corresponds to a convex optimization
problem, and so any local solution is also a global optimum. But the support vector
machine is fundamentally a two class classifier. But in practice we need a multi-classifier
where the number of classes is more than 2. Various methods have been proposed for
combining multiple two-class SVMs in order to build a multiclass classifier. One
commonly used approach is one-versus-the rest approach. In this method if there are K
classes, K separate SVMs are constructed in which the kth model is trained used data
from a class C as positive examples and the data from the K-1 classes are used as
negative examples. But the disadvantage of the method is that using decisions of the
individual classifiers can lead to inconsistent results in which an input is assigned to
multiple classes simultaneously. In spite of the disadvantages, SVM is the best classifier
available for object classification and in our experiments in research the number of
classes are large (e.g. 33 classes) and the one-versus-the rest approach is used. It is
important to know about the different equations used for Support Vector Machines
because for different situations, different equations need to be used. This article was
helpful in describing the different equations and why they are used.

Boyd, S., and Vandenberghe, V., Convex Optimization, Chapter 8.6. Classification,
pages 422 431. Cambridge University Press, 2004

In pattern recognition and classification problem, two sets of patterns are given
and the task is to find a function that is positive on the first set and negative on the
second. In linear description, an affine function classifies a set of points in a class as
positive and the other set of points as negative. Geometrically, a hyper plane that
separates the two sets of points is determined. But when two sets of points cannot be
linearly separated, it is required to find an affine function that approximately classifies
the points for example one that minimizes the number of points misclassified. In general,
it is very difficult to find solution to the problem. But a solution for approximate linear
discrimination based on Support Vector classifier is used to solve the problem. The
support vector machine is very successful in image classification and hence is used in our
research. At APL, one of the foundational aspects of work is programming a support
vector machine so it is necessary to understand its processes and how it can aid in
classifying objects.

C. H. Lampert, H. Nickisch, and S. Harmeling, Attribute-Based Classification for


Zero-Shot Visual Object Categorization. IEEE Transactions on Pattern Analysis
and Machine Intelligence, Vol. 36, no.3, March 2014.
Jacob Chandran
IM 6, English 12 AP
5/15/17
Standard computer systems have the ability to recognize certain objects and group
them into broad categories for example, regardless of the type; the computer would
categorize a dog as a dog. The systems are trained with thousands of images to recognize
these objects, but if the system encountered a new object, it would classify the object
based on what it knew. If the system only knew the classification, dog, then if it saw a
basketball, it would still call it a dog. Computer scientists are now training systems to
recognize attributes of the objects instead of just labeling the object. This would not only
create a more specific labeling system, but would allow classification of unknown
objects. As described earlier, the current computer systems could incorrectly classify an
unknown object, but if computers were able to recognize attributes, the same ability that
led the system to classify a dog as fuzzy, could recognize a carpet as fuzzy, even if the
system had never encountered a carpet before. This research explains the effort to make
attributes dynamic and applicable to different objects instead of static descriptions that
cannot be used in the classification of other objects.

Danilo epanovi. 6.094 Introduction to MATLAB, January IAP 2010. (Massachusetts


Institute of Technology: MIT Open Courseware)

This lecture covered MATLAB Basics on how to use MATLAB as an interpreted


programming language writing MATLAB scripts (Programs) on the Editor. Naming
variables such as integer, double and string are explained. Reading data as a matrix,
determining its column and row vectors, finding its length and operating on the elements
of the vector (squaring matrix, multiplying using a constant) are explained. It also
explains how the processed data in the matrix can be stored as a MAT file for future use
by loading the data into the MATLAB environment. Basic scalar operations such as
arithmetic operations, exponentiations, complex expressions, using MATLAB built in
operations (square root, absolute value etc.) are also explained. Matlab MATRIX
operations as transpose, square, addition and subtraction are explained. The basic plotting
utilities to plot the results were demonstrated.

Erik G. Miller and Nicholas E. Matsakis and Paul A. Viola Learning from One Example
Through Shared Densities on Transforms Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2000

Congealing is a process in which elements of a dataset (images) are brought into


correspondence with each other jointly, producing a data-defined model. It is based upon
minimizing the summed component-wise (pixel wise) entropies over a continuous set of
transforms on the data. One of the byproducts of this minimization is a set of transforms,
one associated with each original training sample. This paper demonstrates a procedure
for effectively bringing test data into correspondence with the data-defined model
produced in the congealing process. A probability density function is developed over the
set of transforms that arose from the congealing process. The density over transforms
may be shared by many classes, and is demonstrated how using this density as prior
Jacob Chandran
IM 6, English 12 AP
5/15/17
knowledge can be used to develop a classifier based on only a single training example
for each class. The techniques applied in this paper for information sharing between
objects can be used in zero-shot learning.

Fan, R., Chang, K., Hsieh, C., Wang, X., and Lin, C., LIBLINEAR: A library for large
linear classification, The Journal of Machine Learning Research, vol. 9, pp.
18711874, 2008.

Solving large-scale classification problems is important in visual object


recognition of objects in computer vision. One of the machine learning problems used in
classification is linear classification technique where the data used for classification is
large with many features. LIBLINEAR is a simple easy to use tool for large dataset linear
classification. The solvers in LIBLINEAR perform well and have good theoretical
properties. LIBLINEAR supports linear Support Vector Machine classifier. The
algorithm has two functions train and predict. The train function trains a classifier on the
given data and the predict functions predicts the given input. The algorithms works well
as a two class (Binary) classifier. To handle multi-class problems the one vs. the rest
strategy is used. The train and test procedure takes less than 15 seconds in a large data set
and is faster. In research at APL, the LIBLINEAR algorithm to classify different objects
and learning about the algorithm is essential to understanding how the algorithm can be
manipulated to aid in classification.

G. Kulkarni, V. Premraj, S. Dhar, S.Li, Y. Choi, A.C. Berg, and T.L. Berg, Babytalk:
Understanding and Generating Simple Image Descriptions, Proc. IEEE Conf.
Computer Vision and Pattern Recognition (CVPR), pp. 1601-1608, 2011.

Computer scientists have created a system to reduce a picture with many objects
into words describing the picture. The system uses a natural language processer to take
all the objects in the picture and turn them into words, similar to what the human brain
would do. This process demonstrates a refined attribute processor which can correctly
label different objects in a picture. In addition, this has the potential to make labeling
more specific, labeling objects inside a picture as opposed to just labeling the picture as
one object. This refined ability to label smaller objects in a larger picture is critical to the
research done at APL which aims to improve the computers ability to recognize an
object based on a given description.

Jayaraman, D., and Grauman, K., Zero-shot recognition with unreliable attributes
Advances in Neural Information Processing Systems 27, Dec 2014.

Zero-shot learning allows information transfer through attributes of categories


chosen during training. Using generic attributes (e.g. stripped, four-legged), a classifier
can be constructed for unknown objects which are not in training set. For example, even
without labeling any images of zebras, one could build a zebra classifier by instructing
the system that zebras are striped, black and white. But in practice, standard zero-shot
Jacob Chandran
IM 6, English 12 AP
5/15/17
methods suffer because attribute prediction in novel images is difficult because of the
specificity of each different object. In this method a new classification method, random
forest is developed to train zero-shot models that take into account the unreliability of
attribute prediction. The given attribute signatures for each category are used by attribute
classifier output characteristics to select discriminative and predictable attribute
properties. This technology is important so that classification is built only upon attributes
that can be generalized to new objects. If specificity is not something the machine is built
for yet, only general attributes can be used so that the system can universalize
characteristics to unknown objects. This technology will be important until specificity is
intentional for all objects.

Jegou, H. and Chum, O. "Negative evidences and co-occurrences in image retrieval: The
benefit of pca and whitening," In ECCV, pages 774787, 2012.

The main problem with large scale image retrieval and object classification is that
many approaches are limited to search in a database of only a few million images on a
single machine due to computational or memory constraints. In this paper a short vector
representation of images, which compacts images, is introduced as a possible solution to
the memory constraints. The Principal Component Analysis (PCA) method is used to
perform dimensionality reduction which would basically compact the information into a
smaller space. The PCA is a two-step process (1) centering the data and (2) selecting a
decor related (orthogonal) basis of subspace minimizing the dimensionality reduction
error. The method uses two feature sets for image classification: the Locally Aggregated
Descriptors (VLAD) and Bag of Words (BOW). VLAD and BOW vectors are high
dimensional and therefore take up lots of space. The size of VLAD is 512 x 128 and
BOW ranges from one thousand to one million in size. To limit the size of these feature
sets while still retaining most of the important information, dimensionality reduction by
PCA can be used to compact these matrices.

Lampert, C.H., Nickisch, H., and Harmeling, S, Learning to Detect Unseen Object
Classes by Between-Class Attribute Transfer, Procedings IEEE Conference
Computer Vision and Pattern Recognition (CVPR), 2009.

This article discusses the latest research in the classification of visual objects for
which no training examples are available (Zero-Shot Learning). Standard object
classification methods use labeled training images to classify objects however, humans
are capable of identifying an object if they are provided with a description of the object
(e.g. large gray animal with large trunks elephants). In attribute based classification,
high level descriptions of objects like color, shape or geographic information are used
instead of only images to classify the objects. This allows attributes of known objects to
be transferred and used to describe objects of an unknown class for which no training
data exists. For example, for an attribute striped, images of zebras, bees and tigers can be
used. Many tests have been done passing known and unknown objects through an
Jacob Chandran
IM 6, English 12 AP
5/15/17
attribute based classification system and the results show that by including specific
attribute information for classes, information can be transferred between known and
unknown classes which form the foundation for zero shot learning.
Larochelle, H., Erhan, D., and Bengio, Y., Zero-Data Learning of New Tasks,
Proceedings. 23rd National Conference in Artificial Intelligence, vol. 1, no. 2, pp.
646-651, 2008.

Machine learning has in the past required many thousands of training values and
much human involvement to recognize objects. With the rise of a new technique,
Zero-Shot Learning, systems are now becoming able to recognize unknown objects by
transferring characteristics of known objects to those of unknown objects. This process
often requires humans telling the machine that a certain unknown object has the same
characteristics of an object in the training data. The goal of computer scientists is to
produce a system for which no human involvement is necessary, where the system itself
can automatically recognize similarities between seemingly different objects as the
human brain is capable of doing. As of now, machines are semi-supervised. The
machines are able to link different attributes to words so if a new object is introduced to
the system with a definition, the machine is able to detect a known word and realize that
the two objects share a characteristic.

M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych and B. Schiele, What Helps Where


And Why? Semantic Relatedness for Knowledge Transfer. Proc. IEEE Conf.
Computer Vision and Pattern Recognition (CVPR), 2010.

For a computer to recognize a certain object, it first has to be introduced to


thousands of photographs of the object. Computer scientists are working to connect
known traits from known objects to traits of unknown objects so that they do not have to
put in thousands of pictures for every new object encountered. For example if the
computer recognizes a desktop computer, but not a laptop, the goal is to get the computer
to realize similarities between the two objects, like the fact that both objects have screens.
This has been difficult since the computers have been taking in data and storing them as
distinct and very specific numbers which makes connecting different objects difficult
unless a person would do it manually. Now, if the computers could store data as words, it
would be much easier to connect two objects since words represent broader categories
than numbers and the computer could connect two objects with common words without
manual assistance. For example, the desktop computer and laptop may have different
sized screens or types of screens, but they are still both screens and now the computer can
return some data on the laptop even though it does not recognize it fully whereas before,
the computer would not be able to return any data for the laptop. This paper has helped
me understand the theoretical solution to the problem of identifying objects not
previously exposed to the computer. This would also save enormous amounts of data
because computers would not have to go through thousands of pictures to recognize a
new object.
Jacob Chandran
IM 6, English 12 AP
5/15/17
Olga Russakovsky and Li Fei-Fei, Attribute learning in large-scale datasets, in Trends
and Topics in Computer Vision, pp. 114. Springer, 2012.

The large amount of image data available in the internet can be used to develop
more sophisticated and robust models and algorithms to index, retrieve, organize and
interact with images and multimedia data. An Image database ImageNet built on
WorldNet, a set of synonyms (80,000 nouns) is a useful resource for visual recognition
applications such as image classification. ImageNet aims to populate the 80,000 synsets
(synonym set) with an average of 500-1000 clean images. The data is collected using
Amazon Mechanical Turk. The database is used as a training resource to transfer
knowledge of common attributes to learn new rare objects and as a bench mark dataset
with high quality to test new algorithms. The significance of the dataset is that this set
introduces new semantic relations for visual modeling. Because ImageNet is uniquely
linked to all nouns of WorldNet, whose synsets are richly interconnected, the semantic
relations of different words can be used to learn new models in zero shot learning. The
authors illustrate the usefulness of the ImageNet through applications in object
recognition, image classification and automatic clustering. The scale, accuracy, diversity
and hierarchical structure of the database provide a good training resource for developing
zero shot learning methods in computer vision.

Palatucci, M., Pomerleau, D., Hinton, G., Mitchell, T., Zero-Shot Learning with
Semantic Output Codes. Neural Information Processing Systems (NIPS), Dec,
2009

Formally, the research involved in zero shot learning has been interested in describing
unknown objects based on a foundation of known objects that are pre-defined in a
database. This research is being applied to a new concept: comparing two unknown
objects. A system in the technology of today could recognize unknown objects at a basic
descriptional level, but what would happen if two similar unknown objects were
introduced. Likely the current system would misclassify the second object as the first
unknown object which was similar to it. The technology in this article could potentially
combine the power of discriminative attributes and comparison. First, if discriminatory
variables were used, there would be a lower chance of confusion, but now a system is
being built so that two similar unknown objects can be distinguished by comparison
statements that would further discriminate the two objects. This would be very useful in
expanding the bounds of computer recognition. Right now, recognition is heavily based
in the original training set so scientists had to pick objects demonstrating many
characteristics so that they could be recognized in new objects. If this technology could
introduce comparisons between unknown objects, the focus could come off of the
original data set and onto the attributes themselves which would make the system more
pliable. Then, the original data sets could be made smaller which would free up lots of
memory and make the system quicker because to program an original data set requires
thousands of images.
Jacob Chandran
IM 6, English 12 AP
5/15/17
Parikh. D., and Grauman, K., Interactively Building a Discriminative Vocabulary of
Nameable Attributes, Proc. IEEE Conf. Computer Vision and Pattern
Recognition (CVPR), pp. 1681-1688, 2011.

Human nameable visual attributes play an important role in attribute-based object


classification as they are used in part of the process of object recognition. This article
introduces an approach to define a vocabulary of attributes that are recognizable by
humans and discriminative as to mark objects with distinct characteristics. The
characteristics need to be defining characteristics so that objects can be separated using
the fewest characteristics. Also, characteristics need to be very blatant so that interpreting
the results and separating the object from other objects becomes very easy. In research at
APL, choosing the right attributes to test for is very important because if the right
attributes are selected, there can be fewer of them, yet they can be more effective in
labeling objects. This reduces the amount of data needed for classification which
contributes to speed and memory space which can be used to train the system to
recognize more objects. This works much like a game of 20 questions. The better
questions asked, the fewer that one needs to ask.

Parikh. Devi, and Grauman, Kristen, Relative Attributes, Proceedings of the


International Conference on Computer Vision (ICCV), 2011

In visual object classification tasks, nameable visual attributes observable in


images facilitate zero-shot learning, where one trains a classifier for an unseen object by
specifying which attribute it has. Most of the existing techniques restrict visual attribute
properties to binary labels by specifying if the property is present or not. (e.g. stripped
or not). A binary description of the property limits the information gained from an
attribute. In this research, a more informative and intuitive description for the attribute is
used by defining relative attributes (e.g. is more than, less than). Relative visual
properties are those which humans use that describe and compare objects in the world.
The method has a function to rank the degrees of relativity to predict the relative strength
of each property in the new image. The zero-shot learning aspect of this approach will
allow the system to use relative attributes to compare both known and unknown objects
(e.g. bears (unseen) are furrier than cows (seen)). The authors show that by using
relative attributes, a richer textual description for new images more precise for human
interpretation can be obtained. This will be important in creating a more efficient
machine that can ask fewer questions to obtain a result. The fewer questions needed, the
less data space taken, which will decrease error and calculation time.

Philippe M. Burlina, Aurora C. Schmidt, I-Jeng Wang, Zero Shot Deep Learning from
Semantic Attributes. IEEE International Conference on Machine Learning and
Applications, Dec 2015

In image classification when no training samples are provided for classes,


classification of these unseen classes is not possible. From a human perspective, humans
Jacob Chandran
IM 6, English 12 AP
5/15/17
are able to classify many images at a very early age without being exposed to labeled
samples. In this study several approaches based on the use of semantic class attributes
(e.g. is the object edible, furry, spotted, friendly, etc.). The hypothesis in using the
semantic attributes is that if attribute classifiers can be developed for existing classes,
then they can be transferred to yet unseen classes; and that if attributes can be decided for
yet unseen classes, this can help identify these classes labels based on semantic relations
between classes and attributes. Several methods for determining attributes are presented.
These methods include (a) an approach based on attribute classifiers; (b)
Maximum-a-Posteriori (MAP) method and Minimum-Mean-Square-Error (MMSE)
attribute estimators using image classifiers for known classes. Training and Testing for
these methods were obtained using a dataset comprised of ImageNet images and
Human218 attributes. Preliminary performance of the methods suggest good feasibility
for development of zero shot learning methods applied over larger datasets.

Piyush Rai, 'MATLAB for Machine Learning' University of UTAH, Fall 2011

This report provides a basic introduction to MATLAB and provides examples in


machine learning to enforce concepts. MATLAB can be installed in Windows or UNIX
platforms. Invoking MATLAB usually starts an Integrated Development Environment
(IDE) that includes a window for the REPL, editor windows for writing code, a command
history window, a file system browser etc. MATLAB had a command window which is
REPL (Read Evaluate Print Load) and a program editor window where matlab programs
are developed. The command window interprets commands by executing commands in
the window like a calculator. MATLAB offers features such as script files, the files that
contain user-defined functions. These files consist of MATLAB commands and
statements. The most basic primitive for data representation in MATLAB is a matrix. It
allows creating matrices, accessing matrix elements, mathematical operations on
matrices, built in functions. MATLAB allows for an easy data representation for machine
learning and a programming environment where many machine learning algorithms are
available for Zero Shot Learning problems. MATLAB also provides functions to import
and export data. For zero shot learning, the attributes and feature sets are stored in
MATLAB format for easy retrieval. MATLAB provides a rich set of functions to plot
2-Dimensional and 3-dimensional plots easily which allows one to see the classification
accuracy and other images used in classification. MATLAB provides an efficient storage
alternative of sparse matrices which let you store only the non-zero entries. The LIBSVM
used for generating model requires its input in sparse format and MATLAB allows for
easy passing of data for model creation.

Razavian, A. H., Azizpour, H., Sullivan, J., and Carlsson, S. "CNN Features
Off-the-Shelf: An Astounding Baseline for Recognition", Proceedings of the 2014
IEEE Conference on Computer Vision and Pattern Recognition Workshops pp.
512-519, 2014
Jacob Chandran
IM 6, English 12 AP
5/15/17
Conventional Neural Net OverFeat features are very powerful features for image
classification and attribute detection. These generic descriptors (OverFeat features)
extracted from conventional neural networks consistently superior results compared to
state of the art visual classification systems on various datasets. The OverFeat system
obtains its accurate classification rate by using a linear Support Vector Machine
Classifier which is then applied to a feature representation of size 4096 that was extracted
from a layer in the net. In attribute detection, an attribute is defined as a semantic or
abstract quality which different categories share. In the experiment on attribute detection,
two attribute data sets, one containing shape, part or material (UIUC 64) and other
containing 9 human attributes (H3D datasets)were used along with OverFeat features.
Linear kernels with libsvm with one-vs-one were then used for multi-class classification.
At APL research on zero shot learning primarily deals with OverFeat features and
understanding the process by which the OverFeat network was selected is important in
understanding the zero shot learning projects.

Ricardo Martin, Attributes Presentation Oct 11, 2011

This presentation provides the basic understanding of using visual attributes for
zero-shot learning. Visual attributes are attributes, which are visual qualities of objects,
such as red, striped or spotted. Nameable attributes capable of or susceptible to being
named or identified; identifiable. An Attribute based detector is a detector that first
predicts the presence of an array of visual properties (e.g., spotted, metallic etc.) and
then uses the outputs of those models as features to an object classification. Nameable
attribute discovery can be made using a feature space can be used to identify attributes
that are discriminable. Discriminable attributes can be obtained from product description.
Attributes are used for zero shot learning Train relative attributes on a set of categories.
Describe unseen categories with comparisons: bears are furrier than giraffes but less
furry than rabbits lions are larger than dogs, as large as tigers, but less large than
elephants Build a model based on the attributes for the unseen categories. Test the
accuracy of that model. Unseen categories are modeled as Gaussian distributions in
attribute space constrained by the category definition.

Sharmanska, V., Quadrianto, N., and Lampert, C.H., Augmented Attribute


Representations Proc. European Conf. Computer Vision (ECCV), 2012.

This article discusses a new learning method to find the mid-level feature
representation which combines the semantic attribute representation with non-semantic
features derived from images. Semantic representation augmented with non-semantic
attributes derived from images was found to improve the object classification in a visual
classification system. Semantic attributes are used to transfer knowledge in zero-shot
classification system where training data is not available. The method is extended to
cases where few training samples are given either with class annotation (supervised) or
without it (unsupervised). In this method the semantic attribute representation is
Jacob Chandran
IM 6, English 12 AP
5/15/17
augmented with additional non-semantic midlevel features to improve the classification
accuracy. The non-semantic part of the representation is learned and is added to semantic
part. The additional feature dimension overcomes the shortcomings of the semantic ones.

Socher, R., Ganjoo, M., Manning, C.D., Ng, A.Y., Zero-Shot Learning Through
Cross-Model Transfer Advances in Neural Information Processing Systems, 26
NIPS 2013

This article proposes a new system called a joint system. The purpose of this
system is to combine advances in many new technologies such as zero shot learning
which identifies new objects, one shot learning which can determine objects based on few
descriptive words or attributes, and knowledge and visual attribute transfer which has the
ability to classify different objects and categorize them. The joint system model has two
different methods, one for known objects and one for unknown objects. The system puts
pictures through one set of procedures if it is a known object and another set of
procedures if the object is new or unknown. This technology finally implements zero shot
technology into a larger framework that includes memory (recognizing familiar items)
and categorization (describing new objects). This technology is in its incipient stages but
would be revolutionary towards the work being done at APL in order to recognize
unknown objects. By providing a way to store known objects and classify unknown
objects within the same substructure, scientists could not only store mass amounts of
data, but use attributes of the known objects to reinforce the categorization of unknown
objects. By using the technique of a joint system along with the technology of zero shot
learning at APL, the error in classification would drop significantly to produce accuracy
of 90% and above.

Xiaodong Yu and Yiannis Aloimonos, Attribute-based transfer learning for object


categorization with zero/one training example, in Computer VisionECCV 2010,
pp. 127140. Springer, 2010.

In one-shot or Zero shot learning problems, the object categories have only one or
no training example per category for classification. Conventional learning algorithms
cannot function due to lack of training examples. To solve these problems, knowledge
transfer is important wherein prior knowledge obtained from known categories is
transferred to unknown categories via object attributes. Object attributes are high-level
descriptions of object categories such as color, texture, shape, parts, context etc. The
semantic knowledge of the attributes represents common properties across different
categories and they can be used to transfer knowledge between known and unknown
categories. In this paper, an attribute based transfer learning framework is developed. A
generative attribute model based on Author-Topic Model to learn the probabilistic
distribution of image features for each attribute (attribute priors) is used. These attribute
priors are used to (1) classify unseen images of target categories (zero shot learning) and
(2) facilitate learning classifiers for target categories when there is only one training
Jacob Chandran
IM 6, English 12 AP
5/15/17
example per target category. The main contributions to the method are (1) a generative
attribute model which offers flexible representations for attribute knowledge transfer (2)
Two methods that use attribute priors in learning of target classifiers and combine the
training examples of target categories when they are available. The method uses the
Animal with Attributes dataset to show the performance of the methods.

You might also like