INDEX
Explanations
mentions of "spam" or "spoof."
occurrences of the substring "sp" in words or phrases
New Auto-Interp
Negative Logits
âĸ¬âĸ¬
-0.68
pigeon
-0.62
purse
-0.61
houses
-0.59
quo
-0.59
presumptive
-0.59
guarding
-0.58
dispatch
-0.58
ĪĴ
-0.57
hon
-0.57
POSITIVE LOGITS
atial
1.38
aghetti
1.37
oiler
1.36
iral
1.28
ooky
1.23
aces
1.21
herical
1.16
inning
1.14
oil
1.13
resso
1.13
Activations Density 0.024%