gtheory-app: a Shiny app for generalizability theory analyses
Authors/Creators
- 1. Unidade de Educação Médica, Faculdade de Medicina e Ciências Biomédicas, Universidade do Algarve
- 2. Faculdades Pequeno Príncipe
- 3. Inspirali Educação
- 4. European Board of Medical Assessors
Description
Generalizability theory (G-theory) extends classical reliability theory by partitioning measurement error into multiple, separable sources, thereby supporting the design of more dependable measurement procedures. Existing software for G-theory — most notably the GENOVA Suite and EduG — is widely used but predates contemporary R-based reproducible workflows and lacks interactive exploration of decision (D) study trade-offs. We introduce gtheory-app, a free and open-source Shiny application that brings G-theory analysis into the browser. The application accepts long-format data uploads with auto-detected delimiters or score-only files paired with a user-specified design, supports balanced and unbalanced data, and handles fully crossed, nested, and mixed (random + fixed) designs with an arbitrary number of facets. Variance components are estimated via lme4 (REML) with a classical ANOVA cross-check available for balanced data. The application then reports universe-score variance, relative and absolute error variances, the generalizability coefficient (Eρ²), the index of dependability (Φ), Φ(λ) for criterion-referenced decisions, a G-facets sensitivity table, and an interactive 3D surface graph over any pair of user-selected facets with the resulting D-study. Every numerical output is paired with a context-sensitive, plain-language interpretation that names the user's actual facets, places reliability values in conventional bands, and identifies the largest single error term as a target for redesign. We illustrate the application on three classes of designs: a fully crossed three-facet design, a nested-items design, and a multi-facet chain-nested design. We argue that gtheory-app fills a clear gap in the open-source psychometric software ecosystem and lowers the barrier to G-theory for applied researchers, educators, and assessment professionals.
CHANGELOG:
0.6.0 (May 2026)
Added
- Universe-size column for partially-fixed facets. The facets
specification table in section 2 of the sidebar now has a sixth
column, "Universe size". For a fixed facet whose universe is
larger than the sample and finite (e.g. EduG's `D = 2 of 17` in
example 16), enter the universe N here; the app then applies the
Cardinet `(1 − n/N)` factor when computing the Mixed estimates,
the Corrected estimates, the G-Study contributions, and σ²(X̄..).
For random facets and fully-sampled-fixed facets you can leave the
field blank and the app defaults to N = n.
This closes the last EduG-compatibility gap. With v0.6.0, example
16 reproduces EduG's σ²(X̄..) = 0.015 and D's absolute-error
contribution = 0.005 (27.3%) exactly.
Fixed (inherited from v0.5.0)
- All v0.5.0 patches are carried forward:
- τ classification for "D + only-fixed-I" components (→ Δ_only).
- Corrected-column sign behavior (negatives preserved).
- Cardinet finite-population correction applied to every fully-
fixed component, not just D-only ones.
- σ²(X̄..) finite-population skip rule (now refined to use
the user-specified N rather than assuming n = N).
- Estimation-method toggle (REML vs. ANOVA Method-1).
0.5.0 (May 2026)
Fixed
- τ classification for `D + only-fixed-I` components. EduG puts
components consisting of a differentiation facet combined with
only fully-sampled-fixed instrumentation facets (e.g. MO, MK:O,
DSM, DMO, SMO on the 16-Language-Learning design) into the
absolute-error column, not into universe-score variance σ²(τ).
v0.4.0 was wrongly adding them to σ²(τ) (averaged over the fixed
I's), which inflated Coef_G. v0.5.0 routes them to Δ_only,
matching EduG. For example 16 this brings σ²(τ) from 0.0406 down
to 0.035 and Coef_G relative from 0.87 up to 0.90.
- σ²(X̄..) in mixed designs. The error-variance-of-the-grand-mean
helper now skips components that contain any fully-sampled-fixed
facet (the Cardinet (1 − n/N) factor is 0 for those, so they
contribute nothing to the grand-mean variance under the universe
of generalization). For purely random designs this is identical
to the v0.4.0 formula (verified: brennan3E gives 0.2195 either
way, so Φ(6) still matches EduG's 0.659).
Known limitation: the design table only flags facets as
"random" or "fixed". It does not yet let the user specify a
finite-but-larger universe N for partially-fixed facets such as
EduG's `D = 2 of 17`. For those facets the app treats n = N, so
the (1 − n/N) factor is 0 and the partially-fixed facet does not
contribute to σ²(X̄..). EduG, which knows N = 17 for D, applies
(1 − 2/17) = 0.882 and gets a non-zero contribution. Adding an
`n_universe` column to the design table is on the v0.6.0 roadmap.
Added
- Estimation method toggle. A new radio control in the sidebar
(section 1, just below the data input) lets the user choose between:
- REML (lme4) — default; works for any data, balanced or
unbalanced. Slow on saturated nested designs with many random-
effect levels.
- ANOVA Method-1 (closed form) — balanced data only; runs in
milliseconds because it computes SS / df / MS by inclusion-
exclusion and inverts the EMS matrix in one shot. Numerically
identical to EduG on every balanced design the application can
represent. The toggle automatically falls back to REML if the
data is unbalanced.
Use the ANOVA path when you want EduG-style instant results on
large nested designs like the EduG "16 Variations of attitudes
towards learning of German" example, which previously took
lme4 several minutes to fit.
Fixed
- Corrected column now matches EduG's sign behavior. v0.4.0
truncated negative values in the Corrected column to zero with
`pmax(Mixed, 0)`. EduG actually keeps the sign on the Corrected
estimate (e.g. MI = −0.006 in example 10) and only applies the
finite-population correction (n−1)/n when the component consists
entirely of fully-sampled-fixed facets. v0.5.0's `.anova_decomp()`
now follows the EduG rule exactly:
- If every facet in the component is fixed AND fully sampled,
multiply Mixed by Π (n_f − 1) / n_f.
- Otherwise Corrected = Mixed, including any negative sampling
artifacts.
The downstream G coefficient calculation still treats negative
variance estimates as zero when summing into σ²(δ) and σ²(Δ), in
line with EduG's G-Study Table convention. So the display shows
the negative (matching EduG's report), but the coefficient
arithmetic uses the truncated value.
0.4.0 (May 2026)
This release consolidates all of the v0.3.x patches into a single tagged
version.
Fixed
- SS / df / MS now populated for every component in saturated
nested designs. v0.3.0's `.anova_decomp` passed a single saturated
aov formula to R, and R silently dropped aliased terms in saturated
nested designs (e.g. the EduG "10 Dependability of change of
attitude" example), leaving the SS, df, MS, Random, Mixed, Corrected
and SE columns blank for higher-order components. v0.4.0 computes
SS and df directly from cell means via inclusion-exclusion -- an
algorithm that is identity-by-construction on balanced data and is
agnostic to crossed vs. nested vs. mixed designs. Every component
in EduG's output now has its SS/df/MS reported by the app.
- G coefficients match EduG when the differentiation facet is
fully fixed. v0.3.0 used the lme4 / Mixed variance estimate
directly for σ²(τ), giving Eρ² and Φ values that drifted from EduG
whenever a differentiation facet was fully sampled and declared
fixed. v0.4.0 applies the Cardinet finite-population correction
σ²(τ) ← σ²(τ) × Π (n_f − 1)/n_f for every D-only component made
up entirely of fully-sampled-fixed facets, reproducing EduG's
"Corrected" column when it is relevant for τ. For example 10
(M differentiating, n_M = N_M = 2), Coef_G relative now reads 0.78
(matching EduG); v0.3.0 reported 0.879.
- Upload header / data column-count mismatch. EduG-exported
files frequently leave the score column un-named in the header
(the header has one fewer field than every data row). R's
`read.table(header = TRUE)` silently treats the first data field
as a row name and shifts every column one position to the left;
the downstream model fit then fails with a generic "An error has
occurred". v0.4.0 detects the mismatch up-front in `read_uploaded()`,
auto-appends `scores` (or `scores`, `V2`, ... for larger gaps),
reads the file with explicit `col.names`, and shows a yellow
"Heads up" notice in the Data preview tab explaining what was
auto-fixed.
- Case-insensitive facet / parent matching. Typing `T` as a
facet label and `t` as its parent letter (or vice versa) no longer
raises *"facet 'J' is nested in unknown facet(s): t"*. A new
internal helper `.normalize_design()` lowercases and trims both
facet labels and parent references at every entry point of the
math layer; data column names are renamed to the canonical
lowercase form inside `prepare_data()`.
- No more hard-coded facet-letter meanings. The interpretation
layer no longer assumes `t = "tasks"`, `p = "persons"`, etc. It now
always reads the user's actual column name as the descriptive name
in the plain-language interpretations; if the user labelled their
column `T` for teachers, the text says "T", not "tasks". The
bundled synthetic-example datasets were renamed to use full-word
data columns (`persons`, `items`, `raters`, etc.) so the
built-in dropdown still produces readable plain-language output.
Improved
- Facets section help text is now shorter (four short bullets,
one expandable worked example using EduG's "07 ClassObservation"
T/JO design). The previous long block has been condensed.
0.3.0 (May 2026)
Fixed (post-release patches)
- Facet-name resolution audit. Removed the hard-coded mapping
that assumed single-letter facet labels had conventional meanings
(`t = "tasks"`, `p = "persons"`, etc.). The interpretation layer now
uses the user's actual column name as the descriptive name; if the
user labelled their column `T` for teachers, the text says "T", not
"tasks". Bundled synthetic examples were renamed to use full-word
data column names (`persons`, `items`, `raters`, etc.) so the
built-in dropdown still produces readable plain-language output.
- Case-insensitive facet/parent matching. Typing `T` as a facet
label and `t` as its parent letter (or vice versa) no longer raises
"facet 'J' is nested in unknown facet(s): t". A new internal helper
`.normalize_design()` lowercases and trims both facet labels and
parent references at every entry point of the math layer; data
column names are renamed to the canonical lowercase form inside
`prepare_data()`.
Fixed
- Φ(λ) bias correction (matches EduG). The criterion-referenced
dependability coefficient Φ(λ) now uses the Brennan/Cardinet
bias-corrected formula, subtracting σ²(X̄..) — the error variance of
the grand mean — from (X̄..-λ)² in both the numerator and the
denominator. On the bundled `brennan3E.csv` dataset at λ = 6 the
application now reports Φ(6) = 0.659 (matching EduG); the previous
v0.2.x naive formula gave 0.708. The old formula is preserved as
`compute_phi_lambda_naive()` for backwards reproducibility.
Added
- EduG-style ANOVA columns in the Variance components tab.
On balanced data the components table now reports SS, df, MS, an
analytic SE on each variance component, and three flavours of the
estimate matching EduG's columns:
- Random: naive Method-1 ANOVA recovery (treats every facet as
random).
- Mixed: Cardinet-style adjustment for finite-universe fixed
facets. This is the value the application uses downstream for G,
Φ, and Φ(λ).
- Corrected: Mixed with negative estimates truncated to zero.
On purely random designs Random ≡ Mixed ≡ Corrected and all three
match the REML estimate to numerical precision.
- New G-Study tab mirroring EduG's "G Study Table" output, with
one row per variance component split into differentiation,
relative-error, and absolute-error contributions (with their
percentage shares), and a footer block reporting:
- sum of variances (differentiation / relative / absolute)
- relative and absolute SEMs
- generalizability coefficients (relative Eρ² and absolute Φ)
- grand mean
- σ²(X̄..) (error variance of the mean for levels used)
- SE of the grand mean
- σ²(X̄..) helper (`sigma2_grand_mean()`) in `R/g_coefficients.R`
that returns the error variance of the estimated grand mean over
replications of the study. This is the quantity EduG reports as
*Error variance of the mean for levels used* and is the basis of the
Φ(λ) bias correction.
- Multi-parent / chain-nesting support. The `nested_in` field
now fully supports comma-separated parent lists (e.g. `d,s` for
classes nested in DxS combinations, or `c,d,s` for pupils nested in
classes within DxS). v0.3.0 also computes the **transitive closure**
of the parent graph, so users can supply only the immediate parent
(`c` for *p*, `d,s` for *c*) and the closure rule will still
correctly require all transitive ancestors to be present. The
EduG-style "16 Variations of attitudes towards learning of German"
example with design `(p:c:DS) × m × (i:k:o)` now works in the app.
- `σ²(X̄..)` row in the G-coefficients summary panel so users can
read off the same number EduG prints as "Error variance of the mean
for levels used".
- Updated legends documenting the new columns, the new tab, and
the bias-corrected Φ(λ).
Internal
- `prepare_data()` uses transitive parents when safe-encoding nested
factor levels, so chains of arbitrary depth are no longer ambiguous
to lme4.
- `validate_design()` now detects cycles in the (transitive) parent
graph.
- `.anova_decomp()` is a new private helper that returns SS/df/MS plus
the EMS coefficient matrices for both the Random and Mixed paths in
one place; both `estimate_vc_reml()` and `estimate_vc_anova()` use
it.
v0.2.1 (May 2026)
Changed
- Documentation DOI in the in-app introductory panel updated to
the concept DOI <https://doi.org/10.5281/zenodo.20086375>, which
resolves to the latest version of the Zenodo deposit rather than to
a specific frozen version.
- Long-format example in the data-upload help block now uses
plural column names (`persons, items, raters, scores`) and shows
additional rows so the long-format pattern is clearer to first-time
users.
Added
- App version visible in the UI. The title bar shows a `v0.2.1`
chip next to the title; a small footer at the bottom of every page
repeats the version and links to the documentation DOI.
v0.2.0 (May 2026)
Added
- Always-visible introductory panel above section 1 that explains
Generalizability Theory for non-initiated users, ending with the
full documentation DOI: <https://doi.org/10.5281/zenodo.20115464>.
- New file `AUDIT.md` documenting the full code audit performed for
this release: REML/lme4 correctness, G-coefficient assembly,
plain-language interpretations, and comparability with EduG and the
GENOVA Suite.
- `tests/test_vc.R` now asserts REML/ANOVA agreement to 1e-4 on
balanced data — a regression test for the v0.1.x ANOVA bug.
Fixed
- ANOVA-path EMS matrix. The expected-mean-square coefficient rule
in `estimate_vc_anova()` was inverted in v0.1.x — it computed
"facets in B but not in A" rather than the correct "facets not in
B" (Henderson Method 1 / Cornfield–Tukey rule). The default REML
path used by the UI was never affected, but ANOVA-based variance
estimates were systematically wrong for any design with unequal
facet sizes. See `AUDIT.md` §4 for the full derivation and fix.
Removed
- "Score column only" upload mode. The expand-grid-based row order
was fragile and produced incorrect facet assignments when the user's
row order did not match R's `expand.grid` convention. The long-format
upload (with automatic separator and score-column detection) is the
single supported path.
v0.1.x (prior development)
Initial public preview. Three-facet crossed and nested designs,
four-facet crossed designs, chain-nested designs, mixed-model designs
with fixed facets, interactive 3D D-study surface, plain-language
interpretations on every output, automatic delimiter and score-column
detection.
Files
g-theory-app-0.6.0.zip
Files
(40.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:cd5d3b25687367f404b5dd96885fa2b8
|
40.6 kB | Preview Download |
Additional details
Dates
- Updated
-
2026-05-12bug fixes