INDEX
Explanations
proper nouns, particularly names and places
New Auto-Interp
Negative Logits
gba
-0.16
онÑĮ
-0.15
æĻ¨
-0.15
rab
-0.15
æĤ
-0.15
esters
-0.15
ocus
-0.15
apl
-0.14
ewing
-0.14
áno
-0.14
POSITIVE LOGITS
iveness
0.17
ment
0.17
erd
0.16
triangle
0.15
illance
0.15
åĭ
0.15
Pickup
0.15
صÙĨع
0.15
romance
0.14
clas
0.14
Activations Density 0.067%