coevolution   <<   accurate evaluation   >>   the DELPHI algorithm

accurate evaluation

Contents:

This page provides an informal discussion of how accurate evaluation may be achieved in coevolution. For a more detailed, technical exposition of these issues, please consult the original article on which this webpage is based:

De Jong, E.D. and J.B. Pollack (2004). Ideal Evaluation from Coevolution [PS] [PDF]. This is the final manuscript of the article. The final version of this article will be published in Evolutionary Computation, Vol. 12, Issue 2, published by The MIT Press.

1. Coevolution

Coevolution algorithms are evolutionary computation methods in which the evaluation of individuals is influenced or even determined by other evolving individuals. Here we will assume that the quality of an individual is completely determined by other evolving individuals. Thus, individuals on which evaluation is based may be viewed as tests which assess different aspects of an individual, where these aspects together determine the value of the individual as a solution.

2. Learners and Tests

We distinguish between two kinds of individuals: learners and tests. All individuals used as candidate solutions to our problem, and which we therefore want to be of high quality, are called learners. In many coevolution algorithms, all individuals are viewed as learners. For instance, one may set up two populations of chess players against each other, and hope that an arms race will cause both populations to increase in quality. However, if the goal of each population is just to improve over the current other population, then there is no incentive to provide accurate evaluation of this other population. Thus, neither population may have access to reliable information about the relative quality of its own individuals. This makes sustained progress unlikely, especially if multiple objectives underly a problem. Therefore, it may be a good idea to dedicate part of the individuals in a coevolutionary setup to the task of providing accurate evaluation for the others. Such individuals are called tests.

Note that the predicates of learner and test are roles; whether an individual is a learner or a test depends on how the individual is used in the coevolutionary algorithm. In a single population setup for example, individuals can be used both as learner and as test.

In some problems, the distinction between learners and tests will be obvious, as tests may be of a different type. In Hillis's seminal sorting networks research for example [Hil90], the performance of sorting networks was evaluated by means of coevolving test sequences called parasites. Thus, the sorting networks functioned as learners and the test sequences as tests. In this example, the distinction between learners and tests is clear as the individuals of the two classes have a different genetic structure.

3. Test-based problems

What kind of problems can be addressed by coevolutionary algorithms that distinguish between learners and tests? Briefly, the answer is any problem for which the performance of individuals (learners) can be assessed by means of a series of tests. The notion of test is to be taken broadly here; a classifier can be tested by considering its performance on examples (test points), but playing against a certain opponent in chess can also be viewed as a test, as the outcome of the game reveals information about the quality of the learner. More generally, a test is any procedure that reveals some information about the quality of a learner. Accordingly, when addressing test-based problems, the goal will be to find learners that combine the ability to solve different tests.

3.1 When can coevolution be used for test-based problems?

In principle, coevolution can be used for any test-based problem. However, for problems where the set of tests is small and performing a test is not too expensive, it may be feasible to evaluate a learner on all tests, and there would be no need to use coevolution to evolve the tests. Therefore, coevolution may be useful when the set of tests is very large.

4. Towards Accurate Evaluation

How can we set up the coevolutionary process such that the resulting evaluation will be accurate? This is the central question with which we are concerned here. To talk about accurate evaluation, we first have to make clear what type of solution we are looking for, i.e. what is our solution concept [Fic03]?

To answer this question, we first observe that individual tests can be seen as objectives, i.e. as partial quality measures that together determine the quality of a learner. This observation leads naturally to the adoption of the Evolutionary Multi-Objective Optimization (EMOO) framework. The idea of viewing the tests in coevolution as objectives in the sense of EMOO is called Pareto-coevolution.

The solution concept employed by Pareto-coevolution is described in more detail below. Two other solution concepts that may be used in coevolution are provided by theoretical approaches to coevolution based on game theory and order theory.

4.1 Game Theory

The game theory perspective provides the notion of Nash-equilibrium, defined as a combination of strategies for which no player can profitably deviate given the other strategies; see e.g. [Osb94]. For work using game theory to model coevolution, see e.g. [Fic00, Fic03].

An attractive feature of the Nash equilibrium as a solution concept is that the set of learners it identifies may be relatively small compared to the Pareto-front, which may be very large; this feature may facilitate the identification or approximation of good solutions.

A disadvantage of the Nash equilibrium as a solution concept for optimization is that a Nash equilibrium is not necessary of high quality; a Nash equilibrium can be dominated by other Nash equilibria, meaning there are other sets of learners that obtain higher or equal outcomes for all tests (and a higher outcome for at least one test). Thus, if the solution concept is the Nash equilibrium, there is no guarantee that a solution will be of high quality. It may however be possible to define a stricter solution concept based on the notion of a Nash equilibrium; this may make it possible to guide game-theoretic algorithms for coevolution towards the equilibria of interest.

Recently, Ficici has introduced an archive based on the Nash-equilibrium solution concept [Fic03]. Given an unbounded memory, the Nash-memory archive can provide monotonic progress for symmetric games, i.e. problems where learners and tests come from the same space.

4.2 Order Theory

Another formalism that can be used to model preference relations between different individuals in a coevolutionary setup is the mathematical theory of (pre-)orders. Using order theory, coevolution can be modeled in such a way that the resulting solution concept is a generalization of the Pareto-front concept [Buc03]. This provides a way to study coevolution using the mathematics of order theory.

A valuable concept from order theory is the view of tests as inducing an ordering on learners. This view provides an alternative way in which the requirements for accurate evaluation may be studied. Using this approach, it can be shown that given a set of tests, the subset of maximally informative tests induces the same order on a set of learners as the whole set of tests [Buc03]. Thus, the set of maximally informative tests satisfies the criteria for a Complete Evaluation Set (CES) and is sufficient to provide accurate evaluation of learners. An interesting recent finding in this research is that transitive games such as the compare-on-one problem can be more difficult for coevolutionary algorithms than intransitivity [Buc03a].

5. Pareto-Coevolution

Pareto-Coevolution was first described in two simultaneous papers [Fic00, Wat00]. In Pareto-coevolution, tests are viewed as objectives in the sense of Evolutionary Multi-Objective Optimization (EMOO). Below, we will give a brief summary of the main concepts of EMOO. A more detailed account of EMOO is beyond the scope of this webpage; for more information, the reader may consult Carlos Coello Coello's EMOO website, which provides a substantial collection of references, online articles, and other resources.

A main concept in EMOO is Pareto-dominance. An individual x dominates another individual y if and only if:

x has at least as high a value for each objective as y
and
x has a higher value than y for at least one objective

If an individual is dominated by another individual, the latter individual will always be preferable as a solution. Thus, an individual that is dominated by another available individual will not be part of the solution of an EMOO algorithm. If on the other hand an individual is not dominated, it has some combination of features that makes it unique. More specifically, a non-dominated individual cannot be improved on some objective without decreasing the value of another objective (if this were possible, the result would dominate the individual).

When all possible individuals are considered, the selection of non-dominated individuals is called the Pareto-optimal set or the Pareto-front. The Pareto-front is the ideal solution of an EMOO problem. For many problems however the complete Pareto-front is very large. Therefore, in practice the goal for an EMOO algorithm is to find a representative selection of the Pareto-front.

5.1 Test-based problems as multi-objective problems

How does the EMOO framework relate to coevolution? In other words, what are the objectives in a test-based problem? As remarked above, the idea of Pareto-coevolution is to view each test as a single objective, and this defines a valid solution concept. Thus, any test-based problem can be seen as a multi-objective problem. For many problems of interest however, the tests will not all be independent. There may be a smaller, more meaningful set of objectives that determines the outcomes of the tests.

Example: When testing the strength of a bridge, we can imagine a series of tests applying increasing loads to the bridge to see if it can bear the load: 100 kg, 500 kg, 2000 kg, etc. If the bridge can bear a heavy load (e.g. 2000 kg), it will also be able to hold any lighter loads. Therefore, the three tests in this example can be seen as testing on the same objective, namely a strength objective. Here it is clear from the descriptions of the tests that the tests represent different thresholds for a single underlying objective, namely strength. However, the same relation between tests can be present when no such description is available. For instance, it may be the case that 10 different opponents in chess represent increasingly difficult thresholds for a single underlying objective of chess, which might be like 'tactics', or 'quality of play in the mid-game'. This is the case when each opponent fails at least those players failed by the previous opponent and one or more other players.

5.2 The underlying objectives of a test-based problem

The above makes clear that there can be hidden or underlying objectives in a problem. If this is the case, i.e. unless all tests test on entirely different aspects, we only need information about some of all possible tests to compare different individuals. For example, if we know that one candidate solution to a bridge coevolution problem holds the 2000 kg load and another bridge does not, then we don't need to perform any other weight tests to know which of the two designs is the stronger one. Moreover, for a given set of learners, the set of tests required can be further reduced by realizing that a comparison in EMOO can only have three possible outcomes, as will be discussed below.

In joint work with Anthony Bucci, we are studying how the underlying objectives may be extracted from a given problem. Implicitly, a test-based problem defines a space, and the idea behind this research is to study the structure of this space; for more information on this, see Anthony's page on the geometric view of coevolution. Experimental results with the DELPHI algorithm suggest that coevolution methods can identify the underlying objectives of a problem.

6. Ideal Evaluation

From the above, we can conclude that any test-based problem has some set of underlying objectives, and that individual tests give us information about differences between learners in their objective values. A natural question then is: which tests do we need in order to perform accurate evaluation of learners? To answer this question, we first observe that there can only be three cases when comparing two learners a and b:

Thus, to perform ideal evaluation, we only need to know for every pair of learners whether one learner dominates the other or not. We therefore define the ideal evaluation function Fideal(a, b) as a function that tells whether learner a dominates learner b:

Fideal(a, b) = dominates(a, b)

where dominance is based on all underlying objectives of the problem. A learner a dominates another learner b if a is higher than b in some objective while b is not higher than a in some objective. Thus, the only type of information we need in evaluating learners is whether one learner is higher than another in some objective. The underlying objectives of a problem follow from the test outcomes. Therefore, even if we don't know the underlying objectives of a problem (as in the case of chess for example), we know that any higher value for an objective must correspond to a higher outcome for some test (otherwise) and vice versa.

6.1 The Complete Evaluation Set (CES)

Whenever a test returns a higher value for one learner than for another, we say the test makes a distinction between the two learners. The notion of distinctions was introduced by Sevan Ficici in [Fic01]. Using this concept, we can now consider the following property which a set of tests may possess:

if there exists a test that makes a distinction between two learners a and b,
then the evaluation set contains such a test

A set of tests satisfying this property is called a Complete Evaluation Set, or CES. The CES was first described in [DeJ02]. It can be shown that evaluating learners using a Complete Evaluation Set results in ideal evaluation. Formally, the evaluation function that uses a CES as tests is equal to the ideal evaluation function that follows from the (typically unknown) underlying objectives:

FCES(a, b) = Fideal(a, b)

A formal proof of this is given in [DeJ04]. An important property of the CES is that it is of a limited size; specifically, a CES for n learners need never contain more than n2 - n tests.

7. Approximating Ideal Evaluation

The previous section defined the Complete Evaluation Set (CES), a set of tests that is sufficient to provide ideal evaluation. Given a population of learners, could a coevolutionary algorithm identify such a Complete Evaluation Set? To answer this question, we first observe that a CES is limited in size; it need never be larger than the square of the number of learners. This is an important property; the total set of tests may be very large or even infinite, but to evaluate a set of n learners we only need a set of at most n2 tests.

A theoretical question is whether we can search for a CES such that convergence is guaranteed. This is not difficult; it is sufficient to generate tests such that every possible test has a non-zero probability of being generated, and keep any test that makes a distinction that is not made by the set of tests kept so far. In this way, convergence to the CES is guaranteed. This shows that in the limit of identifying such a CES, accurate evaluation for a given set of learners can be achieved.

A practical problem with the above approach is that unless tests for all distinctions have been found, it cannot be known whether all possible distinctions have been found. Thus, when no new distinctions are found for some time, this may be either because more search is required, or because all possible distinctions between the given learners have already been made. A possible approach to this problem would be to measure the rate at which new distinctions are found, and estimate the probability that new distinctions can still be made.

By approximating a CES for every generation of learners, we would have a coevolutionary algorithm that can approximate ideal evaluation at a tunable degree of accuracy. However, for any generation of learners, this algorithm is expected to spend several generations of evolving tests. Such an algorithm is not expected to be an efficient way of approximating ideal evaluation. Another approach is to balance the effort spent on evolving learners and test. An algorithm taking this approach is called DELPHI, and is described in detail on the following page.

References

[Buc03] Bucci, A. and Pollack, J.B. (2003). A Mathematical Framework for the Study of Coevolution. Foundations of Genetic Algorithms (FOGA'02).

[Buc03a] Bucci, A. and Pollack, J.B. (2003). Focusing versus Intransitivity. Geometrical Aspects of Coevolution. Genetic and Evolutionary Computation Conference (GECCO-03).

[DeJ02] De Jong, E.D. and J.B. Pollack (2002). Principled Evaluation in Coevolution. Technical Report no. CS-02-225, also available in single-spaced version. Computer Science Department, Brandeis University. May 31, 2002.

[DeJ04] De Jong, E.D. and J.B. Pollack (2004). Ideal Evaluation from Coevolution [PS] [PDF]. This is the final manuscript of the article. The final version of this article will be published in Evolutionary Computation, Vol. 12, Issue 2, published by The MIT Press.

[Fic00] Ficici, S.G. and Pollack, J.B. (2000). A Game-Theoretic Approach to the Simple Coevolutionary Algorithm. Parallel Problem solving From Nature (PPSN-'00). M. Schoenauer and K. Deb and G. Rudolph and X. Yao and E. Lutton and J. Julian Merelo and H.-P. Schwefel. Springer LNCS series, vol. 1917.

[Fic01] Ficici, S.G. and Pollack, J.B. (2001). Pareto Optimality in Coevolutionary Learning. Sixth European Conference on Artificial Life. J. Kelemen (ed.). Springer, Berlin.

[Fic03] Ficici, S.G. and Pollack, J.B. (2003). A game-theoretic memory mechanism for coevolution. Genetic and Evolutionary Computation Conference (GECCO-03).

[Hil90] Hillis, W.D. (1990). Co-evolving parasites improve simulated evolution as an optimization procedure. Physica D, 42:228-234.

[Osb94] M.J. Osborne and A. Rubinstein (1994). A Course in Game Theory. Cambridge, MA: The MIT Press.

[Wat00] Watson, R.A. and Pollack, J.B. (2000). Symbiotic Combination as an Alternative to Sexual Recombination in Genetic Algorithms. Parallel Problem solving From Nature (PPSN-'00). M. Schoenauer and K. Deb and G. Rudolph and X. Yao and E. Lutton and J. Julian Merelo and H.-P. Schwefel. Springer LNCS series, vol. 1917.


Created by Edwin de Jong

coevolution   <<   accurate evaluation   >>   the DELPHI algorithm