INDEX
Explanations
mentions of spam
occurrences and discussions of spam
New Auto-Interp
Negative Logits
hani
-1.12
IST
-0.75
Borders
-0.67
Fathers
-0.66
Remem
-0.65
Cel
-0.65
Syri
-0.64
Statue
-0.63
avery
-0.63
Patri
-0.63
POSITIVE LOGITS
ming
1.25
spam
1.02
inator
0.92
ulent
0.83
icons
0.82
icide
0.82
vertising
0.82
ular
0.81
mers
0.81
bag
0.80
Activations Density 0.005%