INDEX
Explanations
instances of the term "spam" in relation to unwanted or irrelevant content
words related to fraudulent activities
New Auto-Interp
Negative Logits
ONSORED
-0.67
kinderg
-0.66
WIND
-0.66
anticipation
-0.61
yi
-0.61
tremend
-0.61
spirits
-0.61
unders
-0.59
REDACTED
-0.58
cooled
-0.58
POSITIVE LOGITS
pling
1.16
nesty
1.11
ilies
1.04
sterdam
1.03
elia
1.01
pering
1.01
ilial
1.00
ilar
0.99
amia
0.97
essage
0.95
Activations Density 0.021%