INDEX
Explanations
proper nouns, particularly names such as "Susan."
mentions of the name "Susan"
New Auto-Interp
Negative Logits
ORD
-0.80
iculty
-0.78
cffffcc
-0.72
unct
-0.70
riter
-0.65
olkien
-0.64
erent
-0.64
psey
-0.64
warped
-0.64
ebus
-0.62
POSITIVE LOGITS
ne
0.97
icide
0.97
gha
0.96
jit
0.86
anne
0.86
ja
0.85
icides
0.80
atan
0.79
otte
0.79
mination
0.78
Activations Density 0.033%