Vocabulary Pattern
recognition is a field of application for many applied maths methods
such as statistics, bayesian networks, decision trees, eigen vector
based discrimination methods ("classical data analysis"), artificial
neural networks, fuzzy decision arrays, genetic classifiers, vector
learning machines, expert systems, case based reasoning, ...
It may also be applied to "shapes" that are defined in many ways :
- images : example : optical characters recognition (OCR)
- signals : example : speech recognition
- data : example : answers to a questionnary
- multisensors signatures : example : biometric applications
These several physical sources of data/signals/images make
"preprocessings" come from many applied maths techniques such
as
vision/image processing, signal processing, data analysis, ... Because every applied maths domain has its own
vocabulary, it is
important that we choose a vocabulary for pattern recognition that will
not be influenced by the techniques. We also have to give a clear
definition of 'pattern recognition'. Even if some readers may disagree
with our definitions, they will be usefull for our purpose : giving a
very short description of main difficulties, methodologies, and tricks
not to stay trapped in. We will consider that a pattern recognition system
can be described
with the following regular synoptic :
Rough
data / rough signals / rough
images directly come from sensors or
measurement systems. Preprocessing
elements are :
- transforms (ex : Fourier
Transform)
- normalisations
- denoising
- ...
Descriptors extraction system builds a "signature" from preprocessed
and/or rough data. The "signature" is a set of data that are supposed
to contain useful and "easy to use" information for classification.
Elements of the signature may be :
- shape parameters (ex : Fourier descriptors, shape
descriptors,
...),
- a selected part of preprocessed and/or rough data,
- scores (from scoring methods),
- ...
Classification system takes a set of descriptors and/or rough data and
evaluates which "class" it should belong to.
One also could say that the output of the Classification system is a
qualitative variable with N modalities. Every modality of this variable
is called a "class".
There are many Classification techniques such as : -
classical statistics based methods,
- bayesian networks,
- decision trees,
- eigen vectors based
discrimination methods,
- artificial neural networks,
- fuzzy decision arrays,
- genetic classifiers,
- vector learning machines,
- expert systems,
- case based reasoning, ... -
...
Some classification techniques can tune automatically their parameters
in
order to optimize their performance on a sample of data (we will talk
about "supervised learning", even if classical data analysis
techniques, for instance, never use such a vocabulary).
Some classification techniques may use the knowledge of experts. We
will talk about knowledge based systems.
Postprocessing system allows :
- to estimate the confidence of the classification
- to re evaluate the class taking into account the risk
attached
to misclassification
- to add a class number (N+1) that says "unrecognized" The most frequently asked question is "how to
build an efficient
pattern recognition system ?".
Because the "heart" of such a recognition pattern system is the
"classification" system, one often considers that the first task is to
choose a classification technique : "should i use neural networks ?
Bayesian networks, ...".
This is the wrong way.
The first task is to choose a global architecture : if one uses heavy
preprocessings and smart descriptors extractions, classification may
become easier. On the opposite, classification from rough data may need
to choose a powerful classification technique. The balance between
preprocessings, descriptors extraction, and classification complexity
is the first point to check. There doesn't exist a unique solution :
many solutions may lead to acceptable results.
There is a methodology that helps to build efficient solutions and to
evaluate their viability before the development : the AGENDA
methodology (see further and check references).
How
to build an efficient pattern recognition application : the AGENDA
methodology
The
AGENDA
methodology proposes to
write a functional analysis of the
pattern regognition system (before starting any development). From this
functional analysis, one can choose technical solutions, linking every
technical choice to a functional need. This "classical" way of working
(classical for regular industrial projects ... but not for pattern
recognition projects) allows :
- to share the knowledge of several experts on the rough data
- to choose functional options before any technical choice
that
could disable a functionality
- to brainstorm on algorithms choice
- to guarantee traceability of every technical choice
- to re use solutions from a project to another if some
functionalities are the same
- to test separately every module
- to help for maintenance and further evolutions development
- to communicate quickly with other searchers and engineers
(every choice has an explicit reason) AGENDA proposes a special way of describing
functionalities for a
pattern recognition system. This way is general and can be applied to a
big range of applications. The idea is to consider the known causes of
variations of the input vector :
example : character recognition : Known causes of variations are :
- small rotation of the character
- translation
- thickness of lines
- style (font)
- colour of printing
- name of the character (1, a, 2, b, c, ...)
- ...
This causes of variations generate
together a
combination of cases that
leads to a very big number of different characters. That is why pattern
recognition is a complicated task : only a few causes of the input
vector variations should lead to a variation of the output (ex : name
of the character), most causes of variations of the input vector
shouldn't change the output (ex : thickness, rotation, translation, ...)
It means that the pattern recognition system can be seen as a "filter"
on factors of variations.
This leads to a very simple way of describing the expected
functionalities of a pattern recognition system : one should just
choose if a factor is to be filtered or not : example : -
small rotation
of the character
- translation -
thickness of lines -
style (font) -
colour of printing -
name of the character (1, a, 2, b, c, ...)
- ...
means "usefull" factor of variation : "not
to be
filtered" means
"to be filtered" And every
factor should be linked to a technical
solution
"invariant with this
cause of variation". Example : Red links are a memory of the goal of every
technical
choice. If one cannot find a solution for some factors of
variations, then one
still can use a "learning" classification system : example :
It becomes obvious, looking at
such an "AGENDA graph",
that there exist
many solutions : many technical items may have the same "unvariant
with" functionality; other sensors may lead to a simpler solution; ... But this AGENDA Graph can also bring more
information :
- every link has a cost : computing time, ...
- intuitively, one can understand that a graph with every link pointing
to the classification learning system would need more examples to get
tuned than the above graph (example : no preprocessing, no descriptors
extraction, and rough data directly into a neural network ...). The
idea is to use the design plans in order to build the data
base :
orthogonal design plans (of the set of factors that point on the
learning system) should be considered as a simple way of building this
data base.This data base is needed for :
- getting a learning data base
(if
classification is automatically tuned from data) or a tuning data base
(if classification is tuned
by hand),
- getting a test data base :
to evaluate
performance of the classification system on examples that were not used
to tune it. Although there is a deterministic and simple way
of
building a relevant
data base, one needs to cut this data base into 2 parts in order to get
two sub-data bases : the learning and the test ones. This is made by
using two fractional sub design plans.
Then, it is important to verify that the 2 data bases are
representations of the same statistical phenomenon in the variables
space (preprocessed data and descriptors) : mean values, standard
deviations, and N first eigen vectors must be the same. At this point, a classification technique is
carefully
picked. Criteria
of choice are :
- do i have an explicit expertise for classifying examples
(is
yes, knowledge based systems such as fuzzy decision arrays may be a
good solution : they are usually tuned "by hand"; if not, learning
based methods, automatically tuned from data, such as neural
networks or bayesian networks may be a good solution).
- do i need an explicit explanation of the result (if yes,
"black
boxes" methods are forbidden ...),
- do i have a software that lets me apply classification
techniques (or can i buy one ? develop one ? ...), There are often several classification techniques
that
fit into
criteria and it is a good thing to test several approches in order to
know which part of the result is a "method" effect. Once the classification system is tuned, it is
time to
build the post
processing system : th aim of this system is to interpret quantitative
partial informations of the classification system in order to give to
the final result a confidence. Lack of confidence should lead to a new
output (never given by a classification system) : "unknown" or
"unrecognized". Methods for confidence estimation are different
for
every
classification method, but main ideas are :
- choose a measure of confidence for the selected class,
- choose a measure of hesitation (comparing confidence for the selected
class to confidence for the other classes)
- use a decision method from this measures. Once the pattern recognition system is available,
tests
often show
cases of misclassification. Analysis of those cases allows to upgrade
AGENDA graphs : some factors of variations are added, new
preprocessings/descriptors can be picked, and the data base can be
upgraded. This incremental way of building the system is a
guaranty of
maintenability : indeed, industrial applications often need evolutions
among time (new functionalities, new classes, ...). The AGENDA graphs
can be considered then as a memory of the engineering process, allowing
a very quick reverse engineering for maintenance and upgrades.