INDEX
Explanations
names, nicknames, titles, and renaming actions in text
New Auto-Interp
Negative Logits
ramid
-0.74
romy
-0.73
icult
-0.70
ersen
-0.68
imeo
-0.65
yrus
-0.65
otropic
-0.65
umat
-0.64
ondo
-0.62
berra
-0.62
POSITIVE LOGITS
plates
0.84
paces
0.83
synonymous
0.81
plate
0.80
naming
0.79
bestowed
0.78
slogan
0.71
ãĥĩ
0.70
recognition
0.70
aliases
0.68
Activations Density 10.628%