INDEX
Explanations
references to political figures and their actions or statements
New Auto-Interp
Negative Logits
lich
-0.15
anel
-0.14
äºĭ
-0.14
lam
-0.14
.nasa
-0.14
esper
-0.13
crap
-0.13
íķŃ
-0.13
Rain
-0.13
ãĤ¤ãĥ«
-0.13
POSITIVE LOGITS
should
0.29
should
0.24
Should
0.23
Should
0.23
ought
0.22
shouldn
0.20
.should
0.17
deber
0.17
920
0.17
SHOULD
0.17
Activations Density 0.179%