INDEX
Explanations
tweets made by different individuals
expressions indicating tweets or social media posts
New Auto-Interp
Negative Logits
士
-0.76
bred
-0.70
ulhu
-0.66
phal
-0.64
å§«
-0.61
ricular
-0.61
Trials
-0.60
vette
-0.59
zik
-0.59
riks
-0.59
POSITIVE LOGITS
"@
0.90
realDonaldTrump
0.83
hasht
0.83
weet
0.83
URL
0.80
tweets
0.80
hashtag
0.78
ITCH
0.75
Tweet
0.75
condol
0.72
Activations Density 0.034%