INDEX
Explanations
proper nouns, particularly personal names and surnames
New Auto-Interp
Negative Logits
oins
-0.19
umas
-0.15
icens
-0.15
iever
-0.15
Cho
-0.15
amura
-0.15
cdb
-0.14
odes
-0.14
s
-0.14
urch
-0.14
POSITIVE LOGITS
jun
0.26
-bin
0.23
bin
0.22
bing
0.19
ying
0.19
Bin
0.19
addin
0.17
ting
0.17
OTHERWISE
0.17
bin
0.17
Activations Density 0.027%