INDEX
Explanations
female first names
names and proper nouns, particularly related to people and characters
New Auto-Interp
Negative Logits
ickr
-0.78
anyl
-0.76
bably
-0.71
deterrent
-0.68
sleeper
-0.68
uggest
-0.68
commercially
-0.67
dictators
-0.66
preempt
-0.64
undown
-0.63
POSITIVE LOGITS
hyde
1.03
abeth
0.90
cia
0.89
Rae
0.82
ette
0.82
Anne
0.80
Marie
0.78
otte
0.78
Chal
0.77
Cohn
0.77
Activations Density 0.307%