When should you equate test forms?

Vincent LimaCut score

Athlete clears the bar

When you launch a certification or licensing test—or any pass-fail test—one of your tasks is to determine the passing score.

The passing score separates those who have demonstrated the required level of knowledge or competence to hold the credential from those who haven’t done so yet.

It’s important to remember that this passing score applies to the specific set of questions that appear in that initial form of the test. (Wondering why? See this video.) As a routine matter of test maintenance, you will continually update the test. With each update, you will need to consider the passing score.

Why and how do you do that?

Athlete clears the bar
No matter where the meet is held, the bar should be set at the specified height.
Image © Joe. Adobe Stock.

You do it because you want to be fair. You want to ask candidates to clear the same bar no matter which form of the test they happen to take.

That’s why. As for how, it depends on the extent to which you changed the test.

If the test blueprint is substantially the same and you have simply introduced new questions, you will need to consider statistical equating.

Sometimes the changes are negligible: As a rule of thumb, if less that 10 percent of the test content has changed, and the difficulty of the new items closely matches the difficulty of the items they are replacing, it isn’t necessary to equate the passing score.

If, on the other hand, the new items account for more than 10 percent of the test content, or if some of the new items differ in difficulty from the items they are replacing, you will want to use statistical equating to set the passing score of the updated form.

All that assumes the test blueprint is substantially unchanged.

If you have revamped the test, statistical equating is not an option; you will need to set the passing standard anew. For example, your test for taxi drivers used to have questions about the best route to the airport. Thanks to ubiquitous GPS, those questions have been dumped and new questions about bicycle awareness and mobile-phone use have been added. To set the old standard, your stakeholders considered just how well someone needed to know routes before getting a taxi-driver license; now they need to consider a different set of questions. Statistics are no substitute for stakeholder judgment.

I know of one other circumstance where you need to set the standard anew. That’s when your test volume is so low as to make statistical equating impossible. A medical subspecialty that tests 10 candidates a year may need to set the standard annually.

Such circumstances aside, statistical equating is preferable for two reasons.

First, it’s cheaper than standard setting.

Second, it is fairer. Here’s why.

Consider a test that launches with two forms. You can (and should) set the standard on Form 1, then equate Form 2. What if, instead, you set a standard on Form 1 and set a standard on Form 2? Even if you use the same panel at the same time, you are still setting two standards. They may be the same; then again, thanks to the vagaries of human judgment, they may be different.

With equating, you seek to maximize the chances that someone who took Form 1 would have the same outcome as he or she would have if Form 2 had been administered instead.