INDEX
Explanations
proper nouns, particularly names of individuals
New Auto-Interp
Negative Logits
umlu
-0.19
undos
-0.18
tog
-0.16
ization
-0.15
/or
-0.15
orp
-0.15
rops
-0.14
/vnd
-0.14
baiser
-0.14
theless
-0.14
POSITIVE LOGITS
boy
0.17
boys
0.16
ãĢħ
0.15
ιÏĩ
0.15
æ¸Ī
0.14
sik
0.14
elow
0.13
Variant
0.13
sy
0.13
erea
0.13
Activations Density 0.287%