INDEX
Explanations
statements of fact or truth
statements that refer to factual information or opinions about societal issues
New Auto-Interp
Negative Logits
addon
-0.81
bably
-0.74
arnaev
-0.73
Dispatch
-0.71
etsk
-0.70
afort
-0.69
iciary
-0.67
éĹĺ
-0.66
ulet
-0.65
ãĥĩ
-0.64
POSITIVE LOGITS
psychologists
1.06
Many
1.03
Studies
0.98
women
0.96
Many
0.96
many
0.95
females
0.95
Women
0.94
people
0.94
males
0.93
Activations Density 0.730%