INDEX
Explanations
names, especially male names
recognition of individuals or personalities
New Auto-Interp
Negative Logits
ruary
-0.74
FANTASY
-0.69
Fancy
-0.64
uously
-0.64
depend
-0.64
ModLoader
-0.63
Flavoring
-0.62
CW
-0.62
ãĤª
-0.61
Confederation
-0.60
POSITIVE LOGITS
roth
0.77
zan
0.71
uner
0.71
onson
0.70
itzer
0.69
zen
0.65
isner
0.65
acher
0.64
opol
0.64
enhagen
0.64
Activations Density 0.089%