INDEX
Explanations
references to negative events or situations
the word "Bad" used in various contexts related to failure or negative evaluations
New Auto-Interp
Negative Logits
cale
-0.83
pulse
-0.66
rouse
-0.65
imate
-0.63
voy
-0.63
eph
-0.62
alysed
-0.62
forth
-0.61
cyl
-0.61
incorpor
-0.61
POSITIVE LOGITS
Bad
3.79
Bad
2.92
BAD
2.23
bad
2.14
bad
1.69
Sad
1.44
Good
1.33
Poor
1.16
Badge
1.15
Evil
1.15
Activations Density 0.014%