INDEX
Explanations
mentions of the word "lower"
references to decreased levels or conditions
New Auto-Interp
Negative Logits
Pros
-0.80
POL
-0.77
vous
-0.74
Jew
-0.72
Outbreak
-0.69
vp
-0.69
tnc
-0.68
Ze
-0.67
MO
-0.64
Ramadan
-0.63
POSITIVE LOGITS
iating
1.01
than
0.92
than
0.84
iation
0.84
case
0.84
extrem
0.80
downs
0.79
down
0.77
iates
0.76
pitched
0.76
Activations Density 0.016%