INDEX
Explanations
proper nouns, particularly names and places
New Auto-Interp
Negative Logits
Domestic
-0.17
gend
-0.17
anol
-0.15
/goto
-0.14
Accessor
-0.14
shit
-0.14
aging
-0.14
ollar
-0.14
Aging
-0.14
icker
-0.14
POSITIVE LOGITS
deen
0.32
yst
0.23
LOUR
0.22
YST
0.20
ild
0.18
ÑĥÑħ
0.16
corn
0.15
accordingly
0.15
obic
0.15
iginal
0.15
Activations Density 0.008%