INDEX
Explanations
statistics or percentages
quantitative data and statistics related to opinions and behaviors
New Auto-Interp
Negative Logits
of
-0.84
of
-0.74
Of
-0.68
oft
-0.68
OF
-0.65
Of
-0.65
OF
-0.56
didnt
-0.51
nt
-0.51
:(
-0.50
POSITIVE LOGITS
versible
0.59
untarily
0.52
eligible
0.51
lette
0.50
ocrates
0.50
enson
0.50
hematically
0.47
daq
0.47
illions
0.46
quist
0.46
Activations Density 1.088%