INDEX
Explanations
specific terms related to political affiliations and roles
New Auto-Interp
Negative Logits
urt
-0.17
baugh
-0.17
asp
-0.17
eÅŁ
-0.15
ander
-0.15
adel
-0.14
isen
-0.14
isses
-0.14
Jacqu
-0.14
alara
-0.14
POSITIVE LOGITS
ariant
0.15
alyze
0.15
ubl
0.15
compens
0.14
camel
0.14
ayment
0.14
è³ĩ
0.13
Fem
0.13
mandate
0.13
apper
0.13
Activations Density 0.270%