INDEX
Explanations
pronouns related to male individuals
references to a specific individual
New Auto-Interp
Negative Logits
odge
-0.70
Load
-0.68
itect
-0.67
Limit
-0.65
ammy
-0.64
owntown
-0.64
Deal
-0.64
ornia
-0.63
Start
-0.63
Sharon
-0.62
POSITIVE LOGITS
tremend
0.87
atically
0.74
enthusi
0.73
personally
0.71
detractors
0.70
dearly
0.70
redes
0.69
atic
0.69
occas
0.68
behav
0.67
Activations Density 0.053%