Benchmarks for Rules Analysis

Having read hundreds (maybe thousands) of posts on this board, I've noticed that almost every has different benchmarks for how they analyze the rules. I am not talking different play styles. I am talking different scenarios that people consider as their default scenario for rules analysis.

Some players discuss rules in terms of maximum efficiency for a single encounter. Other discuss rules in terms of PvP effectiveness. Others discuss rules in terms of daily effectiveness. And all too often debates start because one person makes a point based on one benchmark and then another person tries to counter the point with an entirely different benchmark.

I would like to propose that we create a set of benchmarks that allow for more coherent analysis. I would also like to weight them (in terms of importance to Alliance rules design), but I think that would create way more arguments than anyone wants to deal with, so I'd ask that we not tackle that in this thread. Personally, once a bunch are created, I'll just list my weighting in my signature. Others can do the same and then at least we will have a common lexicon and sense of how people view rules design.

As a starter, I propose benchmarks be in two categories, level and encounter type.
Level -> Low (2-15), Mid (16-30), High (31+)
Encounter Type -> Wave Battle, PvP, Full Weekend, Module

They can be looked at separately or in combination like this:

High level character (full weekend)
Mid level character (full weekend)
Mid level character (wave battle)
Mid level character (module)
Mid level character (PvP)
Low level character (full weekend)
Low level character (wave battle)
Low level character (module)

This is a rough idea, but I think the idea of having benchmarks is good because it makes conversations more productive. For example, the new proficiency rules are pretty good for low level characters (assuming monsters drop in health like they should), probably breaks even for mid level characters, and is negative for high level characters. That rule change probably is the worst in PvP (not sure yet about other encounter types).

Please don't get too caught up on that example (it really was just off the cuff).

Finally, if I had to rank these current tentatively defined benchmarks, it would be something like:
Full Weekend > Low > Mid > Module > High > Wave Battle >> PvP



I honestly don't think there can be objective benchmarks for rules analysis, especially where combat is concerned, due to the huge differences between how combat is run and scaled in various chapters. People choose the scenarios they do because that is what they see happening at the games they attend.


Seattle Staff
Or we could abandon this path altogether, learn from it, and establish a new one from what’s already been learned from feedback.
Based on the responses, I am going to assume that people here aren't really familiar with how benchmarks work or what value they offer.



Playtest Community Manager
Based on the responses, I am going to assume that people here aren't really familiar with how benchmarks work or what value they offer.
Benchmarks are standards against which other things can be measured. Your post doesn’t actually establish any benchmarks - you’ve simply listed a series of different types of encounters, and assigned what looks like an entirely arbitrary level spread with no chapter data to support it.

It has been brought up in other threads that you may not have played the game in some time, nor have you participated in any active playtesting. While your insights as they relate to a certain specific subset of the game’s history or locations are appreciated, please keep in mind that they are not necessarily a reflection of how the game is currently played, either in general or in specific chapters. They are also, without having had the opportunity to engage in thorough playtesting, not necessarily going to align with the actual experiences people have had throughout the development of the v2 process. These factors may potentially contribute to the creation of benchmark metrics that are inconsistent with the operation and expectations of the current live product across all user groups, or even the currently proposed playtest product across all actively playtesting chapters. It is especially for this reason that we have been asked to only fill out the playtest feedback forms for each cycle if we have actively participated in a playtest, so that the benchmarking that ARC and the owners are engaging in is based on data that has been tried and validated in a live environment. Fortunately, 59 of the 65 responding members on these boards indicated that their chapters were conducting some form of playtesting, which bodes well for the likelihood of most playtest forum users having had some measure of relevant playtest experience.

While myself and others may value the experience you bring to the table, I strongly feel that this is an instance in which supplementing your existing experiences with live playtesting may help bring a number of things into better focus. If you have not yet had the opportunity to do so, I would encourage you to advocate at your local chapter for the scheduling of a playtest or two. In that way, the community could benefit from the feedback of how your past experiences interacted with and compared to your playtesting experiences. I am certain that the owners and ARC would appreciate such actions, as any additional playtesters would better support making the v2 product as successful as it can be.


Benchmarks do not work well for qualitative analysis and I greatly question if qualitative change is not the goal of the whole initiative.

Moreover given the sheer complexity of the game these benchmarks would have to be pages long to mean a damn. For example modules include traps, puzzles, atmospheric effects, single encounter, no escape, and multi encounter.

Wave battles can be single part, town split, static target, gauntlet, timed etc.

While I understand how neat it is to have bench marks, it becomes quickly like kiss principle...seeking to sacrifice the glorious mess of accuracy and freedom of design for the calm numbness of the lie of simplicity.

So while I think a discussion about metrics used for judgement is greatly valuable for a sharing of perspectives, as a rule set to evaluate a rule set I will pass.

Joe S.