INDEX
Explanations
references to political figures and governmental authority
New Auto-Interp
Negative Logits
ften
-0.15
idth
-0.15
odio
-0.14
anki
-0.13
Function
-0.13
_TYP
-0.13
ichel
-0.13
aily
-0.13
ê´Ģ
-0.12
ãĥIJãĥ¼
-0.12
POSITIVE LOGITS
opt
0.31
elect
0.30
decide
0.28
choose
0.28
chose
0.26
decided
0.26
bother
0.25
decides
0.24
opts
0.24
resort
0.24
Activations Density 0.154%