INDEX
Explanations
Twitter handles
references to social media, specifically Twitter
New Auto-Interp
Negative Logits
moot
-0.61
vernment
-0.61
recomp
-0.60
anca
-0.60
itution
-0.59
rans
-0.59
pread
-0.58
999
-0.58
erella
-0.58
geries
-0.58
POSITIVE LOGITS
(@
0.86
@
0.76
Username
0.74
Follow
0.71
edin
0.71
Newsletter
0.71
Subscribe
0.71
Tweet
0.66
HERE
0.65
lov
0.65
Activations Density 0.039%