INDEX
Explanations
mentions of warnings or cautionary statements
instances of the word "warning" and related phrases
New Auto-Interp
Negative Logits
animate
-0.73
anova
-0.72
hedral
-0.69
ophon
-0.66
morph
-0.65
inion
-0.65
ional
-0.65
uate
-0.65
growth
-0.64
ablished
-0.64
POSITIVE LOGITS
Signs
1.04
warning
1.03
signs
0.96
warnings
0.90
warning
0.88
warn
0.84
Warn
0.84
Warning
0.83
disclaimer
0.81
tale
0.77
Activations Density 0.035%