INDEX
Explanations
references to warnings or alert messages
occurrences of the word "warning."
New Auto-Interp
Negative Logits
animate
-0.76
morph
-0.75
anova
-0.73
ophon
-0.72
hedral
-0.71
artney
-0.68
inion
-0.67
tiny
-0.67
atism
-0.67
rencies
-0.65
POSITIVE LOGITS
warning
1.02
warning
0.90
Signs
0.89
warnings
0.89
Warn
0.88
Warning
0.86
disclaimer
0.86
signs
0.83
warn
0.80
llor
0.78
Activations Density 0.036%