INDEX
Explanations
words related to approval or disapproval of government actions
New Auto-Interp
Negative Logits
Goth
-0.67
Equality
-0.66
REDACTED
-0.66
Immunity
-0.65
ISO
-0.65
FD
-0.63
[&
-0.62
imony
-0.59
Danish
-0.59
Nikon
-0.58
POSITIVE LOGITS
ving
1.76
ved
1.53
vable
1.49
ves
1.45
vers
1.43
ven
1.41
ptions
1.27
ption
1.26
pping
1.24
pling
1.22
Activations Density 0.044%