INDEX
Explanations
personal pronouns ('he', 'him', 'his', 'she', 'her') referring to individuals
references to male pronouns
New Auto-Interp
Negative Logits
etheless
-0.62
Articles
-0.62
اÙĦ
-0.58
Services
-0.57
change
-0.56
Actress
-0.56
Communications
-0.56
Addiction
-0.55
Clintons
-0.55
isSpecialOrderable
-0.55
POSITIVE LOGITS
Majesty
1.04
panic
1.00
uristic
0.92
reditary
0.91
/
0.88
mos
0.79
ures
0.79
sing
0.78
eding
0.77
bert
0.75
Activations Density 0.404%