INDEX
Explanations
words related to political scandals and corruption
New Auto-Interp
Negative Logits
MED
-0.74
yna
-0.73
icians
-0.73
achus
-0.70
rad
-0.69
é¾įå
-0.69
angel
-0.69
kins
-0.68
atel
-0.68
moderate
-0.67
POSITIVE LOGITS
lihood
2.16
tendencies
1.01
structures
0.96
liest
0.95
structure
0.95
qualities
0.89
behavior
0.87
minded
0.87
liness
0.85
behaviors
0.84
Activations Density 0.028%