Evaluation Criteria

Evaluation Criteria

This page gives a detailed account of what criteria are given to the judges when we ask them to evaluate the generated settlements.

This is the actual instructions sent to judges in 2020.

Judges will be sent the generated settlement - in previous years this was three maps per generator. They are then asked to walk through the settlements and score each generator based on four criteria. We defined a set of evaluation criteria based on what we believe the challenges in moving towards more human-like settlement generation are. So, when evaluation the generated settlements judges are asked to score each generator in four categories: Adaptability, Functionality, Narrative and Aesthetics. Each category is worth 10 points, and the following questions are provided to the judges to guide their evaluation. These questions are illustrative of the category, but not exhaustive. They should provide judges with an idea of what this category is about, but do not imply that they cover the complete scope the category.

Adaptability Functionality Narrative Aesthetics
Do the structures in the settlement adapt to the terrain? Does the settlement provide protection from danger?
Does it keep mobs out/from spawning?
Protection from other environmental dangers?
Is the settlement evoking an interesting story?
could you give a short description of what this settlement is about that sets it apart from other settlements?
Does the settlement look good?
Do the structure in the environment reflect the environment,
i.e. usage of available material, adaptation to the biome?
Is the settlement accessible to a player avatar in survival mode?
Can you walk to everywhere? Is it easy? Faster modes of transport?
How easy is it to find your way around?
Is it clear what the function of the settlement is?
Does this function make sense in regards to the terrain and environment it is in?
I.e. is the logging camp in a forest, the harbour town at the sea?
Is there a consistent look to the settlement?
Does it appear that all structures belong to the same settlement?
Does the settlement take advantage of terrain features or compensate for problems with the terrain? Does the settlement reflect the embodiment of the player avatar?
Is it appropriately scaled?
Is the functionality of the settlement supporting this narrative function?
I.e. does the fortified frontier settlement have functioning walls, is the farming village equipped with functioning fields?
Is there an appropriate level of variation in the existing structures?
Are the settlements different in reaction to the different initial maps? Does the settlement makes ressources easy to obtain?
Is it easy to get food?
Does the settlement provide the player with additional affordances?
Does the final settlement give any indication of how the settlement developed?
Can you look at the settlement and imagine the order things were built,
or what stages the development of the settlement took?
Is there an indication of the history of the settlement evident in the structure?
Are there any jarring features that make the settlement look unbelievable?
Are there any other ways in how the settlement adapt to the given maps? Does the settlement provide functionality to the villagers? Are the any convincing and consistent allusion to human cultures or specific points in history that the settlement is modelled after?
Does the settlement have its own culture? Can you tell things about this culture just by observing the settlement?

In 2020 we has also added short intro texts for each category:


Adaptability is about how well a given settlement fits into the map and the surrounding terrain. Is the design shaped by the given map, and is the map in turn shaped by the settlement. The challenge here is to not just generate something that can be put on any map, but to generate something that reflects the input (the map).


Settlements in general, and in Minecraft in particular, are not just aesthetic artefacts but also provide functionality in different forms. There are different kinds of functionality. First, there are issues that are not even dependent on Minecraft, like how easy it is to walk through the settlement, or how easy it is to navigate. There are also functions that are more closely tied to the game Minecraft, like keeping the monsters out, or providing enough light so monsters don’t spawn, or making food and crafting stations accessible to the player. This is definitely a category where the list of questions is not exhaustive - it is possible to provide extra affordances that we have not thought of.

Evocative Narrative

In real life settlement tell stories about the people who build the, about their lives, their culture, and their history. Settlements are also living testaments to the history that shaped them. When we look at settlements in reality, or even those designed by humans for games, we get a sense of what this settlement is about, or how it was shaped over time. The challenge here is to automatically produce a settlement that evokes a distinct story - and ideally one that is adapted to the underlying input, the map.

Maintaining Aesthetics

The last criterion is not necessarily about beauty, but about a consistent look. The challenge here is not only to produce something that looks like good design, but to do so, while also addressing the other challenges. It is somewhat simple to design a really great looking house, and just copy it down several times. But having an algorithm design houses and have them look well designed is a different question. This category contains a lot of elements that humans might get right without thinking about it, like building houses that are visually balanced, or well proportioned. In its corollary, this category is also about avoiding those jarring, strange artefacts that procedural design generates that would never be made by a human.

Chronicle Challenge

The 2nd iteration of the competition introduced the chronicle challenge. This was an option extra challenge, that was to be evaluated separately from the regular competition. Contestants were to indicate that their entry was competing in this challenge during submission.
For the chronicle challenge, the generator produces a written book in Minecraft and places it inside the generated settlement – ideally in a place where it is easily found. The book should contain the chronicle of the settlement (in English) and should contain a narrative on how this particular settlement came about. The entries are evaluated based on how well the given chronicle fits the generated settlement and on overall quality.

Those entries that participate in the chronicle challenge should be evaluated on two items:

  1. Overall quality of the narrative, i.e. is it a good text or document?
  2. Fit, i.e. how well does a given chronicle fit into that specific settlement. Would it matter if you switched the books around between the two settlements, or is it a one size fits is all text.

Score Range

The idea is to score the entries in each of those categories, to encourage participants to think on how to tackle each of those problems. Each category can score from 0 to 10.

  • 0 points means that the resulting design shows no consideration of that particular criterion at all.
  • 1 - 4 points means that there are some aspects in which the criterion is addressed
  • 5 points indicates a performance, in that area, that could be from a naive human. At this point, you would not be surprised if this was built by a human.
  • 6 - 9 points indicates an expert level human performance, over a longer time, possibly a group effort. So, we are talking about a group of city planners and architects designing a Minecraft settlement over the course of a year. The higher end of the point scare here should mean a work that would possibly win a design prize.
  • 10 points - superhuman performance - this is so good, you are surprised if this could be generated by a dedicated group of expert humans.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License