INDEX
Explanations
proper nouns, particularly names and surnames
New Auto-Interp
Negative Logits
ikh
-0.18
eig
-0.15
eczy
-0.15
tk
-0.15
po
-0.14
966
-0.14
icense
-0.13
sede
-0.13
.Bunifu
-0.13
poste
-0.13
POSITIVE LOGITS
stu
0.15
ä¸Ģå¹´
0.14
ÑģÑĤаÑĤи
0.14
zh
0.14
uan
0.14
mousedown
0.14
gün
0.13
(æľĪ
0.13
Sanders
0.13
Sanders
0.13
Activations Density 0.094%