INDEX
Explanations
instances of fraud and deception
New Auto-Interp
Negative Logits
league
-0.17
deme
-0.15
tring
-0.14
á»ħ
-0.14
sez
-0.14
aggable
-0.14
Normalized
-0.14
anon
-0.14
Iterable
-0.14
CTL
-0.14
POSITIVE LOGITS
cogn
0.18
/false
0.17
hd
0.16
false
0.16
Pump
0.15
ilden
0.15
pras
0.15
XC
0.14
leading
0.14
ona
0.14
Activations Density 0.193%