INDEX
Explanations
mentions of Twitter activity in the form of tweets
instances of the word "tweeted"
New Auto-Interp
Negative Logits
cised
-0.79
phal
-0.75
arist
-0.75
cum
-0.73
pure
-0.71
psy
-0.70
å§«
-0.68
esan
-0.65
thin
-0.64
circ
-0.64
POSITIVE LOGITS
tweeted
1.02
tweets
1.01
tweet
0.98
tweeting
0.91
Tweet
0.89
hasht
0.88
storms
0.87
hashtag
0.85
Tweet
0.81
storm
0.80
Activations Density 0.009%