Testing Glossary AuTe Logo The test development resource.
Workshop Dates & Locations Search Site Sign Up Contact

Home
Spreadsheets
White Papers
Testing Industry Calendar
Development Calendar
Validation Study
Links
FAQs
Wallace
Input Form
Glossary
Testing Forum

Glossary

While many of the terms below have both mathematically precise definitions and lay connotations, the meanings given below are the ones appropriate to test development.

 

Accreditation

Evaluation by a professional body that an exam or exam development process meets established standards of quality.   The two most common exam accreditations in the U.S. are NCCA and ANSI.  

Adaptive Testing

A method of administering items so that a person’s ability can be estimated from a subset of the items available.   An item or set of items is administered and scored.  Then the next item or set of items is selected depending on the success with prior items.

Items that appear across all exam forms of a specialty.  These items are used to equate forms and calibrate items.

Appeals process

A systematic way of reviewing the test scoring and administration process to respond to questions of the test’s fairness in evaluating an examinee.  

A way of preserving the results of the performances tested so that there is evidence on which to base a review or appeals process.  

acting with the same range of available performance as on the job setting. Technically, making a choice on a piece of paper is a behavior;  but it does not meet the definition above, nor does writing an essay or discussing a topic.   

Bias

A skewed distribution of scores attributable to a trait not demonstrably linked to the ability a test is designed to measure.  Courts usually accept as de facto evidence of bias a score for a group which passes less than 80% of the people who would be predicted to pass a test if they were not linked to the trait.  Tests of performances directly linked to job performance have not been required by the courts to provide evidence regarding bias.  

Binary scoring

Scoring in which items are scored only correct or incorrect.  

Certification

A formal evaluation program which assesses a person’s ability to fulfill specific job duties.    A legal certification program must evaluate a representative subset of the job’s requirements and be unbiased toward the gender, racial and ethnic background of applicants.  Cf, exam, test, credential, licensure.  

Computerized testing

Administration of an exam on a computer, as opposed to paper and pencil or any other form of administration.   

Concepts

any logical or structural constructs that direct behavior.  Attainment of concepts cannot be measured directly;  it can only be measured indirectly by behavior consistent with the concept.   

Confounded

mixing the evaluation of two distinct abilities in one score.  

Controls

the user interface to mechanisms enabling the examinee to influence his trajectory through a test.    Controls usually include Restart, Directions, Finished, and Stop Test.  Additional controls may be More Info, Skip, Next or Back.  

Credential

An assertion by a credentialing body that a person is capable of performing professional duties.    Earning the credential often involves a number of components such as classwork, mentoring, and passing certification exams.   Cf, exam, test, certification, licensure.  

Criterion

The standard by which performance on an exam is judged to be acceptable or not acceptable.  

Criterion referenced test

See Domain-referenced test.     

Cutpoint

The number of items the examinee must answer correctly in order to receive a passing score on the exam.      In a weighted exam, this is the item weighted score an examinee must achieve in order to pass the exam.    See pass/fail exam.  

Descriptive exam

An exam which attempts to accurately assess the candidate’s ability at all points along the ability scale.   The ability scale is typically a normed scale.  See Norm referenced test.   

Dichotomous scoring

Scoring each item right or wrong.   See also polytamous scoring.   

Directions time (item)

time from the presentation of the stimulus to the time the user initiates action to complete the item.     

Domain referenced test

A test which is scored depending on the candidate’s mastery of a domain of knowledge or performance, as opposed to being scored compared to how others perform.  See norm referenced test.  

EEOC Guidelines

Testing guidelines set up by the Equal Employment Opportunity Commission, established in 1986 and revised in 2000.   These guidelines provide for development and administration of tests that are demonstrably unbiased for all population groups.  

Elapsed time (item)

the time from the presentation of the stimulus to the time the examinee indicates the item has been completed.   c.f. Involved time.   

Exam

A test administered to human subjects.

Exam Pool

Scored items on an exam.   

Exam review

A process of evaluating the performance of an exam over a specific period of time.    Issues typically addressed in an exam review are item performance characteristics (P-Value, point-biserial), exam performance characteristics (reliability), and content currency.  

Gantt chart

A management chart which depicts tasks on a timeline according to start time for the task and the time required to complete the task.  

Inter-rater reliability

The consistency in scoring between two observers.   This is usually measured as a correlation between observers’ ratings of a specific set of observations. style="mso-spacerun:    

Involved time

The time from the beginning of the examinee’s response to the indication that the response is complete.        

Involved time

the time required to respond to the item.   This is measured from the examinee’s first action on an item to the indication of item completion.   We would have preferred to use the term Response time, but it has so many psychological associations we chose the more test-specific term Involved time.    c.f. Elapsed time.

Item

A unit of scoring which includes a stimulus situation (including directions), an opportunity to respond, and a method of scoring the correctness of the response.  An item may contain one or more tasks.    An item is the smallest scored unit.   

Item Development Workshop (IDW)

A meeting or meetings at which items may be authored, edited, reviewed, and accepted or rejected for inclusion on an exam. style="mso-spacerun:  

Item Pool

Items that are available for use on an exam.

Job Task Analysis

The process by which an exam development group establishes the link between practice and the skills tested on the exam.   The job task analysis is typically a survey which assesses specifically what people do on the job and how often they do it.  

Licensure

The process by which a state or governmental body approves an individual to perform specific practices.   In licensed professions, it is illegal to practice the profession without a license.   

Low stakes test

A test which doesn’t involve monetary or promotion consequences.  E.g., a test of prerequisites for a course.  Or a test to determine the starting point in a course.    Typically, these exams are not subject to EEOC Guidelines.   

Multiple-Choice Test

a test that constrains examinees’ ability to respond to a situation to an artificially small number of choices.   Sometimes called a selective response test.  

Norm-referenced test

A test in which an individual’s score is given relative to the performance of other individuals.       See also Domain referenced test.      

Omega

A measure of predictive validity per unit time. Only relevant for a specific domain. The omega is the last letter of the Greek alphabet; in this case, Omega is the last word in test worthiness. style="mso-spacerun:      

Pass/Fail exam

An exam in which the results are used only to classify a candidate, and in which the score scale other than the cutpoint(s) are not normed or evaluated.  

Performance Test

A test in which the response modality is essentially identical to the response modality of the target task.

Performance-Based Test

a test of a person’s ability to indicate how he believes he would act in a given situation.  

Point-Biserial Correlation

The correlation between a dichotomous (2-valued) variable and a continuous variable.    In testing, it’s typically the correlation between a right / wrong item score and a total test score.   .   

Polytamous scoring

Scoring items on a scale.      The clearest common example of this is giving partial credit to a response.  

P-Value

The percent of examinees who pass the item in a calibration sample or during a specific time period.   

Raw score

The raw score is the number of items correct on the exam.  Or it may be the percent of items correct on the exam.  The percent score can be converted into the number correct if the number of scored items is known, and vice-versa.  

Ready mode

the default state of the application which allows the scoring program to begin.  It is typically also the state at which the examinee starts each item.   

Recertification

The process by which a person who has been certified is certified as competent to continue practice.       

Reliability

the correlation between two administrations of the test a specified time interval apart, or between equivalent forms of the same test.  

Return on Investment (ROI)

The measure of (Return – Expense) / Expense.   In testing, Expense might be the cost of developing and administering the exam.    Return would be the measurable benefits achieved by using the exam.  For an exam, ROI would more properly be termed ROX, or return on expenditure, since an investment typically shows up as an asset on a balance sheet, whereas training is typically coded as a liability, or expenditure.         

ROI

Return on Investment.    Calculated by (Value + Cost) / Cost.

Sample size

The predicted size of sample required to achieve results at the desired level of statistical significance, assuming a sample distribution of specific characteristics.    In test design, one sample size of importance in the number of responses from the job task analysis.    Another important sample size is the number of candidates required to calibrate an exam or item.  

Scaled scoring

An arbitrary score reported to a candidate instead of the raw score of items correct on an exam.   

Simulation

A stimulus situation which mimics another and which contains elements of the other situation.   

Stem

in a multiple-choice test, the part of the item that presents information and asks the question that has to be answered with the choices.  

Task Analysis

See Job Task Analysis.     

Taxonomy

A system of classifying behaviors;   in exam development, a system of classifying item types.  While anyone can make up an arbitrary taxonomy, the most useful taxonomies are those that help the practitioner establish comprehensive scope, link stimuli and scoring methodologies, and create appropriate conditions for observing responses.  

Test

A test is a measure of behavior observed within a predefined context.  A test presents predetermined stimuli to elicit behavior evaluated consistently across examinees. The stimuli sample a domain with the goal of predicting behavior in the entire domain.

Unscored items

Items administered on an exam whose scores do not contribute to the score achieved by the candidate.     

Unscored pool

A set of exam items from which a subset is drawn to administer as unscored items on an exam.   

Validity

The degree to which an exam, certification or license predicts success on the job.  This is often measured by correlation between a test score and a person’s performance on the job.  Theoretically, the correlation between true and measured ability.     

Verisimilitude

The extent to which a simulation mimics its template.  

Weighting

A process of multiplying the score of each item by a value (or weight), then summing the products of all item weighted scores to compute a score for the test.  


©2007..2009 Authentic Testing Corporation
Revised Saturday, 21-Jul-2007 06:48:19 PDT