INDEX
Explanations
references to various kinds of items or entities, particularly those categorized as "other."
New Auto-Interp
Negative Logits
_ER
-0.15
ere
-0.14
pf
-0.14
borough
-0.13
EqualTo
-0.13
anda
-0.13
поÑĩ
-0.13
رسÛĮ
-0.12
aira
-0.12
hoa
-0.12
POSITIVE LOGITS
alike
0.18
sund
0.15
assorted
0.15
ennie
0.15
bert
0.15
ivan
0.14
imir
0.14
igli
0.14
rame
0.14
esser
0.14
Activations Density 0.033%