INDEX
Explanations
phrases related to significant historical or cultural figures and their contributions
New Auto-Interp
Negative Logits
зÑĥ
-0.16
alphabet
-0.16
amma
-0.15
izza
-0.15
imore
-0.14
ÐĿаз
-0.14
noop
-0.14
alphabet
-0.14
rij
-0.14
ugin
-0.14
POSITIVE LOGITS
mon
0.45
sob
0.40
handle
0.36
tag
0.33
nick
0.32
epith
0.32
alias
0.32
nickname
0.31
sob
0.31
handle
0.30
Activations Density 0.150%