INDEX
Explanations
names and titles related to political figures and their affiliations
New Auto-Interp
Negative Logits
abble
-0.17
imli
-0.16
ÑĮÑİ
-0.15
ayi
-0.14
yne
-0.14
Kraft
-0.14
place
-0.14
vale
-0.14
.mixin
-0.14
esi
-0.13
POSITIVE LOGITS
ognition
0.16
á»Ļn
0.15
enville
0.14
ç²¾ç¥ŀ
0.14
spit
0.14
asher
0.14
κη
0.14
andid
0.13
947
0.13
ÑĻ
0.13
Activations Density 0.014%