Sunday, February 28, 2010

How to test engine strength

I've seen a few people share their engine tests, and I just wanted to mention the concept of test suites, as this is the most accurate way of testing an engine.

When you test engines, you want to remove as many variables as possible. Many people will test engines with an opening book, which is a mistake. Unless the opening book is 100% neutral (meaning that every line ends with an evaluation of +0.00) then the test will be inaccurate. The easiest way to remove that problem is to not use opening books. However, this means the engines have to take time trying to evaluate e2-e4 or basic opening moves. This means they will have less time to evaluate the positions that really make a difference.

That is where test suites come in. Suites are special opening books that start games in certain positions in order to measure engine strength. This removes the need for traditional opening books, and also allows the engines to put their full power into processing important positions, and avoid wasting time on such basic positions such as e2-e4, etc.. Another important aspect of test suites is that the results are able to be reproduced 100%, using the same hardware. Playing a standard engine match is not, as an engine may choose to open with e2-e4 one time, and then choosing to open with c2-c4 the next time.

If you really want to get an accurate measurement of engine strength, using test suites is the best way to test engines against each other.

Currently, I use the Silver Openings Suite test suite, but I believe Fritz also ships with a test suite that comes pre-installed, called the Nunn test suite (named after its author, GM John Nunn).

Here's a good article that covers the basics, and gives step by step instructions on how to use the Silver Openings Suite.

http://www.chessbase.com/newsdetail.asp?newsid=6147


Just as a side note, I finished test Firebird 1.1 against Rybka 3 using the Silver Openings Suite, with controls of 4+2. Results were +31=51-11, in favor of Firebird 1.1

No comments: