| Title: | Test Reliability and CSEM in Educational Measurement |
|---|---|
| Description: | Provides functions for computing test reliability and conditional standard error of measurement (CSEM) based on the methods described in the Reliability in Educational Measurement chapter of the 5th edition of "Educational Measurement" by Lee and Harris (2025, ISBN:9780197654965). |
| Authors: | Huan Liu [aut, cre, cph], Won-Chan Lee [aut], Min Liang [aut] |
| Maintainer: | Huan Liu <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-05-17 07:53:07 UTC |
| Source: | https://github.com/cran/emreliability |
Compute Cronbach's coefficient alpha and the associated standard error of measurement (SEM) for a set of items.
alpha(x)alpha(x)
x |
A data frame or matrix containing item responses, with rows as respondents (subjects) and columns as items. |
Cronbach's alpha is an estimate of the internal consistency reliability of a test. This implementation:
removes rows with any missing values using stats::na.exclude(),
computes the sample covariance matrix of the items,
uses the classical formula
where is the number of items, are item variances,
and is the variance of the total score,
computes SEM as .
A named list with the following elements:
Cronbach's coefficient alpha.
Standard error of measurement (SEM) based on alpha.
data(data.u) alpha(data.u)data(data.u) alpha(data.u)
Compute the conditional standard error of measurement (CSEM) and conditional standard error of scaled scores (CSSEM) under the binomial model.
csem_binomial(ni, ct = NULL)csem_binomial(ni, ct = NULL)
ni |
A single numeric value indicating the number of items. |
ct |
An optional data frame or matrix containing a conversion table with
two columns: the first column as raw scores (0 to |
Under the binomial model, for a test with items and a true-score
proportion , the distribution of raw scores is assumed to be
. This function treats each possible raw
score as the true-score value (i.e.,
) and computes:
the CSEM of the raw scores; and
if ct is provided, the CSSEM of the scale scores defined in
the conversion table.
A list with:
A vector of raw scores from 0 to ni.
A vector of CSEM values (on the raw-score metric) for each raw score.
If ct is provided, a vector of CSSEM values for the
scale scores corresponding to each raw score.
csem_binomial(40) csem_binomial(40, ct.u)csem_binomial(40) csem_binomial(40, ct.u)
Compute the CSEM, CSSEM, and reliability coefficients for raw scores and scaled scores using the full compound binomial error model.
csem_compound_binomial(x, s, ct = NULL, w = NULL)csem_compound_binomial(x, s, ct = NULL, w = NULL)
x |
Examinee-by-item matrix/data frame of item responses, ordered by stratum. |
s |
Numeric vector of number of items in each stratum. Sum(s) must equal ncol(x). |
ct |
Optional conversion table with maxZ + 1 rows. The second column is the scale score corresponding to composite score Z = 0, 1, ..., maxZ. |
w |
Optional numeric vector of weights for each stratum. Defaults to 1 per stratum. |
A list containing:
Raw total scores (row sums of x).
If ct is provided, the composite scale score for each examinee.
CSEM on the raw-score metric for each examinee.
If ct is provided, CSSEM on the scale-score metric.
Reliability coefficient for raw scores.
If ct is provided, reliability coefficient
for scale scores.
data(data.m) data(ct.m) csem_compound_binomial(data.m, c(13, 12, 6)) csem_compound_binomial(data.m, c(13, 12, 6), ct.m)data(data.m) data(ct.m) csem_compound_binomial(data.m, c(13, 12, 6)) csem_compound_binomial(data.m, c(13, 12, 6), ct.m)
Compute the CSEM for a unidimensional IRT model using either MLE- or EAP-based test information.
csem_info(theta, ip, est = c("MLE", "EAP"))csem_info(theta, ip, est = c("MLE", "EAP"))
theta |
A numeric vector (or object coercible to a numeric vector) containing the ability values at which to compute CSEM. |
ip |
A data frame or matrix of item parameters. Columns are interpreted
in the same way as in
|
est |
A character string specifying the estimation method:
|
A list containing:
theta — vector of ability values.
csemMLE — CSEM values for MLE (if est = "MLE").
csemEAP — CSEM values for EAP (if est = "EAP").
Compute Lord's CSEM in classical test theory under the binomial model.
csem_lord(ni)csem_lord(ni)
ni |
A numeric value indicating the number of items (must be at least 2). |
A list with:
Vector of raw scores from 0 to ni.
Vector of Lord CSEM values corresponding to each raw score.
csem_lord(40)csem_lord(40)
Compute CSEM using the Lord Keats approach, which rescales Lord's binomial-model CSEM using empirical KR-20 and KR-21 reliability estimates.
csem_lord_keats(x)csem_lord_keats(x)
x |
A data frame or matrix of item responses, with rows as persons and columns as items. Items are assumed to be dichotomous (0/1). |
This function first computes Lord's CSEM under the binomial model via
csem_lord(ni), where ni = ncol(x). It then rescales the
resulting CSEM curve using the ratio
where KR-20 and KR-21 are computed from the observed data via
kr20(x) and kr21(x), respectively.
A list with:
Vector of raw scores from 0 to ni.
Vector of CSEM values under the Lord Keats method.
data(data.u) csem_lord_keats(data.u)data(data.u) csem_lord_keats(data.u)
Implement the polynomial method for computing conditional standard errors of
measurement for scale scores (CSSEM). A polynomial regression of scale scores
on raw scores is fit for degrees 1 through K; for each degree k,
the transformation derivative is used to map raw-score CSEM values to
scale-score CSSEM values.
cssem_polynomial(csemx, ct, K = 10, gra = TRUE)cssem_polynomial(csemx, ct, K = 10, gra = TRUE)
csemx |
A data frame or matrix containing raw scores and their CSEM on the raw-score metric. It must have at least the following numeric columns:
|
ct |
A data frame or matrix containing the score conversion table. It must have at least the following numeric columns:
|
K |
Integer. Highest polynomial degree to fit. Defaults to |
gra |
Logical. If |
At the beginning of the function, csemx and ct are merged by
the x column (inner join) to create an internal data frame . Only
rows with x values present in both inputs are
used. The polynomial model is then fit to ss ~ poly(x, k, raw = TRUE)
for k = 1, ..., K.
A list with two components:
A matrix with one column containing the R-squared values
from polynomial fits of degree k = 1, ..., K, where
K is the largest successfully fitted degree.
A data frame containing the merged data
(x, csem, ss) and, for each degree k,
the additional columns:
fx_k1, fx_k2, ...: transformation derivatives
for each raw score,
ss_k1, ss_k2, ...: fitted (rounded) scale scores
from the polynomial of degree k,
cssem_k1, cssem_k2, ...: CSSEM values on
the scale-score metric, computed as .
data(ct.u) cssem_polynomial(as.data.frame(csem_lord(40)), ct.u, K = 4, gra = TRUE)data(ct.u) cssem_polynomial(as.data.frame(csem_lord(40)), ct.u, K = 4, gra = TRUE)
A dataset containing the conversion table for the multidimensional data, with first column as raw scores and second column as scale scores
ct.mct.m
A data frame with 32 rows and 2 variables:
raw score
scale score
A dataset containing the conversion table for the unidimensional data, with first column as raw scores and second column as scale scores
ct.uct.u
A data frame with 41 rows and 2 variables:
raw score
scale score
A dataset containing the responses of 3000 subjects to 31 items on three subscales (13, 12, and 6 items respectively).
data.mdata.m
A data frame with 3000 rows and 31 numeric variables named
V1–V31, each representing the response to one item.
A dataset containing the responses of 3000 subjects to 40 items.
data.udata.u
A data frame with 3000 rows and 40 numeric variables named
V1–V40, each representing the response to one item.
Compute Feldt's coefficient as an estimate of internal consistency reliability.
feldt(x)feldt(x)
x |
A data frame or matrix containing item responses, with rows as subjects and columns as items. |
A named list with:
Feldt's coefficient.
data(data.u) feldt(data.u)data(data.u) feldt(data.u)
Compute test information for a unidimensional IRT model (1PL/2PL/3PL) across a vector of ability values.
info(theta, ip, est = c("MLE", "EAP"), D = 1.702)info(theta, ip, est = c("MLE", "EAP"), D = 1.702)
theta |
Numeric vector of ability values at which to compute test information. |
ip |
A data frame or matrix of item parameters. Columns are interpreted in order as:
|
est |
Character string indicating the estimation method:
|
D |
A numeric constant representing the scaling factor of the IRT model.
Defaults to |
Test information at each is the sum of item information.
For est = "EAP", this function returns
where the additional 1 reflects the prior (population) contribution under a standard normal prior.
A list with:
Vector of ability values.
If est = "MLE", vector of test information at each theta.
If est = "EAP", vector of test information at each theta.
A dataset containing the item parameters for the unidimensional data, with first column
as b parameters and second column as a parameters
ip.uip.u
A data frame with 40 rows and 2 variables:
b parameter
a parameter
Compute the KR-20 reliability coefficient for dichotomously scored items (e.g., 0/1).
kr20(x)kr20(x)
x |
A data frame or matrix of item responses, with rows as persons and columns as items. Items are assumed to be dichotomous (0/1). |
KR-20 is an internal consistency reliability estimate for tests with
dichotomously scored items.
Rows containing missing values are removed using stats::na.exclude().
A single numeric value: the KR-20 reliability coefficient.
data(data.u) kr20(data.u)data(data.u) kr20(data.u)
Compute the KR-21 reliability coefficient for dichotomously scored items (0/1), assuming equal item difficulty.
kr21(x)kr21(x)
x |
A data frame or matrix of item responses, with rows as persons and columns as items. Items are assumed to be dichotomous (0/1). |
KR-21 is a simplified alternative to KR-20, assuming equal item difficulty.
Rows containing missing values are removed using stats::na.exclude().
A single numeric value: the KR-21 reliability coefficient.
data(data.u) kr21(data.u)data(data.u) kr21(data.u)
Compute the raw score distribution for a given theta value using the Lord-Wingersky recursive formula, given item-level probabilities of a correct response.
lord_wingersky(probs)lord_wingersky(probs)
probs |
A numeric vector (or matrix) of probabilities that a given theta value will correctly answer each item. If a matrix is provided, it will be coerced to a numeric vector. |
A list with:
Vector of possible raw scores, from 0 to ni.
Vector of probabilities for each raw score.
Generate Gaussian quadrature points and corresponding normalized weights based on the standard normal density over a symmetric interval.
normal_quadra(n, mm)normal_quadra(n, mm)
n |
Integer. Number of quadrature points (must be >= 2). |
mm |
Numeric. Positive value giving the maximum absolute value of the quadrature nodes (range will be from -mm to +mm). |
A list with:
Quadrature nodes from -mm to +mm.
Normalized weights proportional to the standard normal density at each node.
normal_quadra(41, 5)normal_quadra(41, 5)
Compute marginal reliability for a unidimensional IRT model using either MLE-based or EAP-based information, via Gaussian quadrature over a standard normal ability distribution.
rel_info(ip, est)rel_info(ip, est)
ip |
A data frame or matrix of item parameters with columns in the order
|
est |
A character string specifying the ability estimation method:
|
Gaussian quadrature with 41 nodes on [-5, 5] is used to approximate
the integrals.
A single numeric value: the marginal reliability (MLE or EAP,
depending on est).
data(ip.u) rel_info(ip.u, "MLE")data(ip.u) rel_info(ip.u, "MLE")
Compute test reliability for raw scores (and optionally scale scores), along with associated conditional standard errors of measurement (CSEMs), for a unidimensional IRT model.
rel_test(ip, ct = NULL, nq = 11, D = 1.702)rel_test(ip, ct = NULL, nq = 11, D = 1.702)
ip |
A data frame or matrix of item parameters. Columns are interpreted in order as:
|
ct |
Optional. A data frame or matrix containing the score conversion
table. If supplied, it must have |
nq |
Integer. Number of quadrature points used to approximate the
standard normal ability distribution. Defaults to |
D |
Numeric. Scaling constant for the logistic IRT model. Defaults to
|
A list with three components:
A data frame containing the estimated marginal
score distribution for raw scores (and scale scores if ct is
provided).
A data frame with overall error variance,
true score variance, observed score variance, and reliability for raw
scores, and additionally for scale scores if ct is provided.
A data frame with theta, weights, expected raw
scores and corresponding CSEMs. If ct is provided, expected scale
scores and scale-score CSEMs are also included.
data(ip.u) data(ct.u) rel_test(ip.u) rel_test(ip.u, ct.u)data(ip.u) data(ct.u) rel_test(ip.u) rel_test(ip.u, ct.u)
Compute the predicted test reliability after changing test length, or compute the required test-length ratio to achieve a desired reliability, using the Spearman–Brown prophecy formula.
spearman_brown(rxx, input, type = c("r", "l"))spearman_brown(rxx, input, type = c("r", "l"))
rxx |
A numeric value indicating the original reliability (must be between 0 and 1, exclusive). |
input |
A numeric value indicating either:
|
type |
Character string specifying the calculation type:
|
The Spearman–Brown prophecy formula is:
where is the original reliability and is the ratio of the
new test length to the original test length.
Solving for gives:
A named list depending on type:
Predicted reliability of the new test (if type = "r").
Required ratio of new test length to original test length (if type = "l").
spearman_brown(0.7, 3.86, "r") spearman_brown(0.7, 0.90, "l")spearman_brown(0.7, 3.86, "r") spearman_brown(0.7, 0.90, "l")
Compute the stratified Cronbach's coefficient alpha for a test composed of several item strata (e.g., subtests or subscales).
stratified_alpha(x, s)stratified_alpha(x, s)
x |
A data frame or matrix containing item responses, with rows as subjects and columns as items. Items are assumed to be ordered by stratum. |
s |
A numeric vector giving the number of items in each stratum. The
sum of |
Stratified alpha is an estimate of the internal consistency reliability of a
composite test formed by multiple item strata (e.g., subtests). Each stratum
reliability is computed using alpha(), and combined using the
classical stratified-alpha formula.
A named list with:
Stratified Cronbach's coefficient alpha.
data(data.m) stratified_alpha(data.m, c(13, 12, 6))data(data.m) stratified_alpha(data.m, c(13, 12, 6))
Compute the stratified Feldt's coefficient for a test composed of several item strata (e.g., subtests or subscales).
stratified_feldt(x, s)stratified_feldt(x, s)
x |
A data frame or matrix containing item responses, with rows as subjects and columns as items. Items are assumed to be ordered by stratum. |
s |
A numeric vector giving the number of items in each stratum. The
sum of |
Stratified Feldt's coefficient is an estimate of internal consistency reliability for a composite test formed by multiple strata.
A named list with:
Stratified Feldt's coefficient.
data(data.m) stratified_feldt(data.m, c(13, 12, 6))data(data.m) stratified_feldt(data.m, c(13, 12, 6))