INDEX
Explanations
names of political figures and their interactions
New Auto-Interp
Negative Logits
è¹
-0.17
eland
-0.15
strcasecmp
-0.15
.intellij
-0.14
ivet
-0.14
λά
-0.14
uptools
-0.14
haust
-0.14
abic
-0.14
elles
-0.13
POSITIVE LOGITS
oner
0.17
observation
0.17
825
0.16
vÄĽÅĻ
0.16
prof
0.15
alion
0.15
flats
0.15
Gret
0.14
ger
0.14
saying
0.14
Activations Density 0.068%