INDEX
Explanations
proper names
proper nouns and names
New Auto-Interp
Negative Logits
PDATE
-0.85
ilit
-0.84
querque
-0.80
taboola
-0.78
aution
-0.78
ilitation
-0.77
riter
-0.76
manpower
-0.75
udeb
-0.75
rador
-0.74
POSITIVE LOGITS
Mae
1.17
Marie
1.07
Doe
1.07
Nicole
1.06
herself
1.02
Lynn
1.01
Anne
1.00
Rae
0.99
Marie
0.97
Jenner
0.95
Activations Density 0.210%