INDEX
Explanations
words related to fraudulent actions or schemes
occurrences of the word "sw" in various contexts
New Auto-Interp
Negative Logits
pora
-0.80
burying
-0.65
negligent
-0.65
_-
-0.64
ICAL
-0.63
NESS
-0.63
withholding
-0.63
paving
-0.63
cigarettes
-0.62
oglu
-0.62
POSITIVE LOGITS
aths
0.92
sw
0.92
atches
0.90
arf
0.88
immers
0.88
athed
0.87
addle
0.85
apon
0.85
erd
0.84
anky
0.83
Activations Density 0.004%