INDEX
Explanations
terms related to corruption and regulatory compliance
New Auto-Interp
Negative Logits
amon
-0.19
inux
-0.16
idel
-0.15
andbox
-0.15
adr
-0.15
Pew
-0.15
hud
-0.15
itech
-0.15
Äįas
-0.15
инÑĥв
-0.14
POSITIVE LOGITS
Bri
0.40
bribery
0.37
bri
0.37
brib
0.32
corrupt
0.27
corruption
0.25
antib
0.24
Anti
0.23
Gifts
0.23
anti
0.23
Activations Density 0.004%