INDEX
Explanations
social media links prompting to follow accounts
instances of social media follow requests
New Auto-Interp
Negative Logits
minist
-0.75
pite
-0.73
ILCS
-0.67
wagen
-0.63
lob
-0.59
halluc
-0.59
endiary
-0.58
ãĥ¯ãĥ³
-0.58
coron
-0.58
nerv
-0.57
POSITIVE LOGITS
@
1.03
HuffPost
0.89
Stories
0.88
ers
0.87
us
0.85
ership
0.84
Us
0.81
ed
0.77
VICE
0.75
me
0.74
Activations Density 0.018%