INDEX
Explanations
phrases related to warnings
instances of the word "warning."
New Auto-Interp
Negative Logits
animate
-0.77
ophon
-0.72
hedral
-0.72
inion
-0.72
anova
-0.71
atism
-0.70
gres
-0.69
morph
-0.69
tiny
-0.67
artney
-0.65
POSITIVE LOGITS
warning
1.00
Signs
0.97
warnings
0.92
warning
0.90
Warning
0.89
signs
0.89
Warn
0.84
disclaimer
0.83
warn
0.80
llor
0.76
Activations Density 0.033%