INDEX
Explanations
phrases related to social issues and disparities
statements about existence or identity
New Auto-Interp
Negative Logits
inav
-0.82
congr
-0.71
osate
-0.69
Formation
-0.69
ESE
-0.67
iture
-0.67
ortmund
-0.67
pedia
-0.67
ileaks
-0.66
Telesc
-0.65
POSITIVE LOGITS
able
1.20
incapable
1.19
unable
1.15
addicted
1.14
capable
1.08
aware
1.07
born
1.06
prone
1.05
willing
1.05
subjected
1.04
Activations Density 0.481%