INDEX
Explanations
Twitter usernames
the presence of social media handles or mentions
New Auto-Interp
Negative Logits
Icar
-0.63
consolidation
-0.61
partly
-0.61
coax
-0.58
Takeru
-0.55
cigarette
-0.53
sterile
-0.53
stricken
-0.53
wholes
-0.53
dirt
-0.53
POSITIVE LOGITS
(@
4.12
ðŁ
1.62
ðŁ
1.52
@
1.50
ï¸ı
1.46
tweeted
1.39
"@
1.38
(#
1.37
âľ
1.35
ðŁij
1.35
Activations Density 0.023%