Click here to close
Hello! We notice that you are using Internet Explorer, which is not supported by Xenbase and may cause the site to display incorrectly.
We suggest using a current version of Chrome,
FireFox, or Safari.
Environ Toxicol Chem
2025 Mar 01; doi: 10.1093/etojnl/vgaf056.
Show Gene links
Show Anatomy links
Xeredar: An open-source R-package for the statistical analysis of endocrine new approach methods (NAMs) using fish or amphibian eleutheroembryos.
Spyridonov IM
,
Yan L
,
Szöcs E
,
Miranda AFP
,
Lange C
,
Tindall A
,
Du Pasquier D
,
Lemkine G
,
Weltje L
,
Habekost M
,
Thorbek P
.
???displayArticle.abstract???
The experimental design of New Approach Methodologies (NAMs) might deviate from common ecotoxicological studies, often requiring tailored statistical approaches. For instance, in NAMs developed for the detection of endocrine activity using aquatic vertebrate eleutheroembryos (Xenopus Eleutheroembryonic Thyroid Assay (XETA), Rapid Androgen Disruption Activity Reporter (RADAR) assay and Rapid Estrogen Activity In Vivo (REACTIV) assay), all concentration groups are nested within three independent study repeats, named 'runs' in the relevant Test Guidelines. Here, runs are referred to as replicates to emphasize their role as the repeated, independent entity. By contrast, for most other ecotoxicological studies the replicates are nested in the concentration groups. This leads to a different dependency structure for the XETA, RADAR and REACTIV assays. Disregarding this violates the basic statistical requirement for independence of observations potentially leading to incorrect conclusions. Unfortunately, in the statistical sections of the Test Guidelines of the XETA, RADAR and REACTIV assay, it is not clearly recommended to regard this dependency structure as statistical recommendations using a mixed ANOVA are provided only in the annexes. Here, we present "xeredar", an open-source R-package allowing automated statistical analysis of XETA, RADAR and REACTIV assays where the dependency structure of the data is correctly regarded through a mixed ANOVA. xeredar was validated on 36 XETA ring test studies and further tested on 41 RADAR ring test studies. A power analysis was carried out for the REACTIV assay, demonstrating that ignoring the dependency structure potentially leads to lower power and an increased false positive rate in comparison to the mixed ANOVA approach. The open-source R-package "xeredar" also comes with a Shiny app, making it accessible to everyone and thereby enhancing standardization and reproducibility for the statistical analyses of XETA, RADAR, and REACTIV assays.
Figure 1: Scenarios used for simulation. Each scenario models a different effect size in the highest treatment
group. The dashed lines indicate the highest percent effect in the highest tested concentration in the
respective scenario. The varying EC50 values affect the effect sizes in the different concentration groups.
62x47mm (300 x 300 DPI)
Figure 2: Boxplots showing the distribution of the coefficients of variation (CVs) in the spiked (a) or
unspiked modes (b) of Rapid Estrogen Activity In Vivo (REACTIV) ring test studies without significant zero
inflation conducted with inert and active substances. The difference between these distributions is
statistically tested with a Welch two-sample t-test, and the t-statistics (t) as well as the corresponding pvalue is shown. The red dots indicate the mean values of the corresponding CVs. Each box represents the
interquartile range (IQR), with the horizontal line inside the box denoting the median. The whiskers extend
to the minimum and maximum values within 1.5 times the IQR, while any points beyond this range are
shown as individual data points.
259x182mm (72 x 72 DPI)
Figure 2: Boxplots showing the distribution of the coefficients of variation (CVs) in the spiked (a) or
unspiked modes (b) of Rapid Estrogen Activity In Vivo (REACTIV) ring test studies without significant zero
inflation conducted with inert and active substances. The difference between these distributions is
statistically tested with a Welch two-sample t-test, and the t-statistics (t) as well as the corresponding pvalue is shown. The red dots indicate the mean values of the corresponding CVs. Each box represents the
interquartile range (IQR), with the horizontal line inside the box denoting the median. The whiskers extend
to the minimum and maximum values within 1.5 times the IQR, while any points beyond this range are
shown as individual data points.
259x182mm (72 x 72 DPI)
Figure 3: Distribution of adjusted Intra-class Correlation Coefficients (adjusted ICC) per lab for Rapid
Estrogen Activity In Vivo (REACTIV) ring test studies with unspiked and spiked modes. The black horizontal
line indicates the median value of all studies which is 0.14.
175x121mm (118 x 118 DPI)
Figure 4: The course of percentage significant test results in the highest tested concentration of different
testing methods with respect to the simulated percentage effect. (a) contains results of studies with
treatments in spiked mode and (b) contains results of studies with treatments in unspiked mode. The
turquoise line depicts the results from the mixed Williams’ test and the dark blue line of the mixed Dunnett’s
test, the yellow line depicts the results of the pooled Dunnett’s test and the dark red line of the pooled
Dunn’s test. The horizontal line highlights the 80% power threshold that statistical tests should reach in
(eco-)toxicology. The pooled approaches use ⍺ = 0.01 and the mixed approaches use ⍺ = 0.05.
259x182mm (72 x 72 DPI)
Figure 4: The course of percentage significant test results in the highest tested concentration of different
testing methods with respect to the simulated percentage effect. (a) contains results of studies with
treatments in spiked mode and (b) contains results of studies with treatments in unspiked mode. The
turquoise line depicts the results from the mixed Williams’ test and the dark blue line of the mixed Dunnett’s
test, the yellow line depicts the results of the pooled Dunnett’s test and the dark red line of the pooled
Dunn’s test. The horizontal line highlights the 80% power threshold that statistical tests should reach in
(eco-)toxicology. The pooled approaches use ⍺ = 0.01 and the mixed approaches use ⍺ = 0.05.
259x182mm (72 x 72 DPI)
Figure 5: The percentage significant test results for concentration groups where the simulated effect size
was less than or equal to 1%. This figure is based on the studies with treatments for the spiked mode. Each
box represents the interquartile range (IQR), with the horizontal line inside the box denoting the median.
The whiskers extend to the minimum and maximum values within 1.5 times the IQR, while any points
beyond this range are shown as individual data points.
259x182mm (72 x 72 DPI)
Figure 6. Boxplots showing results for all three independent replicates for compounds arabinose (inert), anastrozole (steroidogenic: aromatase inhibitor) and prochloraz (steroidogenic: aromatase transcription inhibitor; anti-androgenic) that generated unexpected results and were incorrectly categorized after statistical reanalysis (see Table 2). False-positive finding (detected A(S) activity for expectedly inert compounds: arabinose (unspiked mode, WatchFrog); false-negative categorizations (no A(S) activity detected for compounds with known A(S)-related MoA): anastrozole (spiked mode, IDEA), prochloraz (spiked mode, WatchFrog).