INDEX
Explanations
references to personal names and their attributes
New Auto-Interp
Negative Logits
Naming
-0.25
naming
-0.23
titles
-0.21
Naming
-0.20
renaming
-0.20
(names
-0.19
å§ĵ
-0.19
nomin
-0.18
nick
-0.18
apellido
-0.18
POSITIVE LOGITS
ame
0.31
na
0.31
Na
0.30
Na
0.30
na
0.30
_na
0.27
-na
0.26
name
0.25
AME
0.24
name
0.24
Activations Density 0.103%