INDEX
Explanations
references to ownership or individuality
New Auto-Interp
Negative Logits
kus
-0.17
urat
-0.16
arto
-0.15
rica
-0.14
EO
-0.14
arks
-0.14
iper
-0.14
ond
-0.14
itchens
-0.14
enet
-0.14
POSITIVE LOGITS
andler
0.18
version
0.17
üzel
0.15
version
0.14
abbrev
0.14
mage
0.14
ovel
0.14
WT
0.14
unique
0.14
elf
0.14
Activations Density 0.119%