INDEX
Explanations
words related to fraudulent activities
references to fraudulent activities
New Auto-Interp
Negative Logits
%"
-0.72
%:
-0.67
Own
-0.67
lihood
-0.64
Dak
-0.64
joy
-0.62
LONG
-0.60
Roses
-0.60
aple
-0.59
VILLE
-0.59
POSITIVE LOGITS
fraud
1.20
ulence
1.17
ulent
1.05
scam
0.92
scams
0.85
sters
0.80
ster
0.80
ously
0.79
perpetrated
0.79
Fraud
0.79
Activations Density 0.015%