INDEX
Explanations
common surnames
proper nouns and names, particularly related to people and their affiliations
New Auto-Interp
Negative Logits
UNE
-0.64
ãĤ¼ãĤ¦ãĤ¹
-0.60
hower
-0.58
REF
-0.57
mun
-0.56
jun
-0.55
WARN
-0.54
Leilan
-0.53
uits
-0.53
Mub
-0.52
POSITIVE LOGITS
yk
0.68
iversary
0.57
iod
0.57
til
0.57
Hol
0.55
lez
0.55
gat
0.54
rium
0.53
kov
0.53
otal
0.52
Activations Density 0.505%