INDEX
Explanations
warnings or alerts
mentions of warnings
New Auto-Interp
Negative Logits
hedral
-0.82
morph
-0.78
animate
-0.77
growth
-0.70
ablished
-0.68
ophon
-0.67
rencies
-0.66
rafted
-0.66
anova
-0.65
artney
-0.65
POSITIVE LOGITS
warning
1.00
warning
0.96
warnings
0.90
Warn
0.89
Warning
0.87
disclaimer
0.85
Signs
0.81
warns
0.80
warn
0.79
warn
0.76
Activations Density 0.030%