What is criterion validity?

Criterion validity or concrete validity refers to a method of testing the correlation of a variable to a concrete outcome. Higher education institutions and employers use criterion validity testing to model an applicant's potential performance. Some organizations also use it to model retention rates.

A properly designed test can predict future outcomes or performance when research has documented a strong correlation between two variables. The correlation coefficient, which ranges from -1.0 to +1.0, demonstrates the strength of the two correlated variables. Before creating such a test and setting the correlation variable, you must determine which variables are the best predictors of success.

Before moving on, let's differentiate between norm-referenced tests and criterion-referenced tests. You've probably taken a few norm-referenced tests in school—the standardized tests taken in grade school and high school.

These tests measure a student's essential knowledge in core subjects, check whether they’re performing at their grade level, and measure their knowledge compared to other students.

We'll consider the difference between norm-referenced and criterion-referenced tests in detail later in this article. First, let's consider the types of criterion validity.

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

Analyze with Dovetail

Types of criterion validity

Two types of criterion validity exist:

Predictive validity: models the likelihood of an outcome
Concurrent validity: confirms whether one measure is equal or better than another accepted measure when testing the same thing at the same time

Predictive validity

A test that uses predictive validity aims to predict future performance, behavior, or outcome. Either the test administrator or the test taker can use the results to improve decision-making.

Real-world use of predictive validity tests

An employer might administer a predictive validity test to determine whether a person is likely to perform well in a specific job. To do this accurately, the employer must have a large data set of people who have already performed successfully in the job.

This example of employers administering a screening test is the most common application of predictive validity, but there are lots of other uses. For example, psychiatrists and psychologists might administer a psychological personality inventory to help diagnose a patient and gauge the possibility or probability of future behaviors.

General practice doctors (GPs) use risk factor surveys as a part of their new patient intake packets. The survey's results predict a patient's potential for developing a disease. For example, a doctor might counsel a patient who smokes two packs of cigarettes a day that their behavior increases their risk of developing lung cancer.

High school students who take the SAT are taking a predictive validity test. College students who take the GRE to gain admittance to graduate school are also taking a predictive validity test. The SAT and GRE both offer criterion validity because, over the long term, students' scores prove a valid predictor of their future academic performance as measured by their grade point average (GPA) if admitted to a college or grad school.

Concurrent validity

A test that uses concurrent validity tests the same criterion as another test. To ensure the accuracy of your test, you administer it as well as an already accepted test, which is scientifically proven to measure the same construct.

By comparing the results of both tests, you can determine whether the one you have developed accurately measures the variable you’re interested in. This type of criterion validity test is used in the fields of:

Social science
Psychology
Education

Real-world use of concurrent validity tests

If a psychologist developed a new, self-reported psychological test for measuring depression called the Winters Depression Inventory (WDI), they'd need to test its validity before using it in a clinical setting. They'd recruit non-patients to take both inventories—their new one and a commonly accepted, established one, such as the Beck Depression Inventory (BDI).

The sample group would take both inventories under controlled conditions. They would then compare the two test results for each member of the sample group. This process determines that the test they developed measures the same criterion at least as well as the accepted gold standard.

In statistical terminology, when the results of the sample population's WDI and BDI match or are close to each other, they're said to have a high positive correlation. In this scenario, the psychologist has established the concurrent validity of the two inventories.

How to measure criterion validity

That brings us to measuring criterion validity. In our example of the psychology inventories, we discussed determining if the inventories functioned concurrently and to what extent they correlated, either positively or negatively.

To measure criterion validity, use an established metric. There are many options to choose from, including:

Pearson Correlation Coefficient
Spearman's Rank Correlations
Phi Correlations

Which correlation coefficient or method you use depends on:

Whether you're analyzing a linear or non-linear relationship
The number of variables in play
The distribution of your data

Advantages of criterion validity

Criterion-referenced tests offer numerous advantages over norm-referenced tests when used to measure student or employee progress:

You can design the test questions to match (correlate to) specific program objectives.
Criterion validity offers a clear picture of an individual's command of specific material.
You can create and manage criterion-referenced tests locally.
The local administrator, such as a doctor or teacher, can diagnose problems using the test results and work with the individual to improve their situation.

Disadvantages of criterion validity

Criterion-referenced tests also have some significant disadvantages:

Building these reliable and valid test instruments is expensive and time-consuming.
You can't generalize findings beyond the local application, so they don't work to measure large group performance across a broad set of locations.
The test takers could invalidate results by accessing the test questions before taking the test.

Applications of criterion validity

To use criterion validity, you need both a predictor variable and a criterion variable. Examples of the predictor variable include the GRE or SAT. For the criterion variable, you need a known, valid measure for predicting the outcome of your interest.

In some areas of study, such as social sciences, the lack of relevant criterion variables makes it difficult to use criterion validity.

Does a criterion validity test reflect a certain set of abilities?

No, a norm-referenced test reflects a particular set of abilities that are ranked. The standardized tests administered to US school students in grade school and high school, like SAT, LSAT, or GRE, reaffirm the level of learning as a score compared to other students in the same population. A population could be all sixth graders in a school, district, state, or nation.

A criterion-referenced test, such as a driving test, determines a person’s ability to drive a car safely and obey road rules to a set standard or criteria. It is a measurement of what they know themselves. Another example is a test at the end of a university semester, which is purely focused on how much a student knows about a certain topic. Students are not ranked—they either pass or fail.

The lowdown

One of the four types of validity tests, criterion validity tests the correlation of a variable to a concrete outcome. If you've sat the ASVAB, SAT, or GRE, you took a norm-referenced test, which is different from a criterion validity test. Those tests indicate to an organization the likelihood of success.

When you design a survey or test instrument, you choose valid measures and appropriate correlation coefficients. You would also test your assessment before using it in a clinical setting to ensure construct validity, content validity, and reliability.

FAQs

What are correlation coefficients?

A correlation coefficient is a descriptive statistic that sums up the direction of the relationship between variables and the strength of the correlation. A correlation coefficient ranges from -1 to 1.

You can have a correlation coefficient of 0, which denotes no relationship between the variables.

What is test-retest reliability?

Also called stability reliability, test-retest reliability refers to the clinical and research practice of administering a test twice, with an interval of a few weeks or several months between the first test and the second. This approach provides evidence of the reliability of the test.

The test administrator compares the two test results to determine their correlation. A correlation of 0.80 or above can be evidence of the test's reliability.

What is construct validity?

Construct validity refers to whether a test or assessment accurately examines the construct the researcher is testing. Take the example in the article of the psychologist creating the fictional WDI and testing it alongside the existing BDI. That test scenario established the construct validity of the WDI.

If a construct validity assessment has a positive result, we described the construct examined as exhibiting convergent validity. If the assessment does not accurately test the construct, we'd use the term discriminant validity to describe it.

What is content validity?

Also referred to as logical validity, content validity refers to how comprehensively a test or assessment fully evaluates a construct, topic, or behavior. Determining content validity requires an expert in the field or topic that the assessment is testing.

What is a valid measure?

In statistical analyses, not all measures provide valid insight into a data set. Your chosen measure needs to correspond to the construct that your assessment is testing.

For example, educators have administered the SAT since 1926. In over 90 years of its administration, it has proven to be an accurate predictor of whether the test taker will fare well in college. We can say that the SAT offers a valid measure of collegiate academic success.

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Editor’s picks↘

What is cognitive dissonance?13 September 2023

Understanding confirmation bias in research30 August 2023

What is critical thinking?21 August 2023

How to create a helpful research paper outline21 December 2023

How to write a research paper11 January 2024

Understanding regression analysis: overview and key uses22 August 2024

What is quantitative data? 19 January 2023

What is cognitive bias?5 September 2023

What are focus groups?19 January 2023

Analysis paralysis: Causes and ways to beat it11 September 2023

Understanding acquiescence bias: What it is and how to avoid it25 November 2024

How to do AI content analysis: A full guide20 December 2023

Latest articles↘