INDEX
Explanations
instances of fraud or deceitful activities
New Auto-Interp
Negative Logits
itſelf
-0.73
Efq
-0.70
fortific
-0.67
intStringLen
-0.67
ValueStyle
-0.66
TagMode
-0.66
myſelf
-0.64
erec
-0.63
ſind
-0.63
<=",
-0.63
POSITIVE LOGITS
scam
1.06
scammed
0.96
fooled
0.95
scams
0.94
defraud
0.88
fraud
0.88
deceived
0.87
deceive
0.83
fraude
0.83
cheated
0.82
Activations Density 0.293%