INDEX
Explanations
mentions of social media actions or usernames
indications of social media engagement
New Auto-Interp
Negative Logits
thrust
-0.79
etheless
-0.75
tackling
-0.69
penal
-0.68
unsc
-0.68
stewards
-0.66
arers
-0.64
tug
-0.64
manoeuv
-0.64
poisoning
-0.64
POSITIVE LOGITS
Associated
1.08
Follow
1.03
Anonymous
1.00
Original
0.98
FOR
0.98
OTHER
0.97
STER
0.94
EDIT
0.92
TOP
0.91
LET
0.90
Activations Density 0.062%