INDEX
Explanations
pronouns and verbs related to what someone is saying
mentions of a female speaker or subject
New Auto-Interp
Negative Logits
ornia
-0.70
atory
-0.69
Skydragon
-0.69
church
-0.65
ouver
-0.65
INGTON
-0.65
XIII
-0.64
undo
-0.64
kefeller
-0.64
erection
-0.61
POSITIVE LOGITS
pher
1.43
pherd
1.34
pard
1.18
athing
1.10
athed
1.08
ppard
1.06
ffield
1.05
ldon
1.04
ikh
0.92
lly
0.88
Activations Density 0.095%