INDEX
Explanations
female names
specific names or proper nouns in various contexts
New Auto-Interp
Negative Logits
âĶĢâĶĢ
-0.80
acebook
-0.65
ruary
-0.64
SOURCE
-0.63
LEASE
-0.63
Cath
-0.62
EEE
-0.61
SOS
-0.61
··
-0.61
ãĤ¨ãĥ«
-0.60
POSITIVE LOGITS
hart
0.80
hair
0.80
iman
0.79
iani
0.79
utsch
0.79
ivan
0.77
ahl
0.77
tsky
0.76
oub
0.76
zynski
0.76
Activations Density 0.278%