INDEX
Explanations
references to bribery and corruption
New Auto-Interp
Negative Logits
izen
-0.74
Anthem
-0.73
stadt
-0.71
blance
-0.70
ulz
-0.69
Balanced
-0.68
ãĥ´
-0.68
geist
-0.65
ouf
-0.65
central
-0.64
POSITIVE LOGITS
bribes
1.01
extortion
0.98
blackmail
0.97
lure
0.87
bribe
0.86
enticing
0.84
induce
0.81
tempting
0.79
tactics
0.77
temptation
0.76
Activations Density 0.064%