INDEX
Explanations
phrases indicating self-identification and personal labels
New Auto-Interp
Negative Logits
iasco
-0.82
ptoms
-0.80
delays
-0.75
glers
-0.72
ponds
-0.69
trap
-0.69
deaths
-0.68
ousse
-0.67
spills
-0.67
flights
-0.66
POSITIVE LOGITS
libertarian
0.92
nonpartisan
0.87
ĪĴ
0.86
Libertarian
0.80
unbiased
0.80
progressive
0.79
fearless
0.78
centrist
0.78
principled
0.78
pacif
0.78
Activations Density 0.079%