INDEX
Explanations
spam-related keywords or phrases
references to spam and related concepts
New Auto-Interp
Negative Logits
hani
-0.83
å§«
-0.73
Became
-0.67
Cel
-0.67
Heb
-0.65
Rite
-0.64
Mart
-0.64
Vernon
-0.63
Patri
-0.61
Beck
-0.61
POSITIVE LOGITS
spam
1.23
ming
1.14
inator
0.89
ulent
0.82
vertising
0.81
ulence
0.81
ular
0.81
ulus
0.79
bugs
0.79
bots
0.78
Activations Density 0.007%