INDEX
Explanations
people's names and physical attributes (such as owner, spokesman) in sentences
references to male speakers or subjects
New Auto-Interp
Negative Logits
cial
-0.66
duc
-0.64
odder
-0.63
Gad
-0.61
Claire
-0.61
Fashion
-0.60
Mandatory
-0.60
Medicine
-0.58
Marriage
-0.58
Transactions
-0.57
POSITIVE LOGITS
'd
1.08
'll
0.90
encount
0.88
suspic
0.83
resy
0.83
've
0.82
©¶æ
0.76
couldn
0.75
ppard
0.73
agre
0.73
Activations Density 0.155%