INDEX
Explanations
tweets from various individuals
instances of the word "tweeted."
New Auto-Interp
Negative Logits
cum
-0.71
phal
-0.70
cised
-0.69
arist
-0.67
vernment
-0.67
pure
-0.65
neg
-0.61
circ
-0.61
istries
-0.61
bred
-0.60
POSITIVE LOGITS
tweets
0.92
storms
0.89
tweet
0.87
Tweet
0.86
hasht
0.82
hashtag
0.82
nesday
0.80
tweeted
0.79
Tweet
0.79
tweeting
0.78
Activations Density 0.016%