INDEX
Explanations
names followed by adverbs modifying time
references to recent events or actions
New Auto-Interp
Negative Logits
ILCS
-0.70
breeding
-0.66
abiding
-0.61
Adults
-0.61
ween
-0.60
intensity
-0.59
Textures
-0.59
aimon
-0.58
preference
-0.58
Exper
-0.57
POSITIVE LOGITS
tweeted
1.48
testified
1.27
retweet
1.21
tweeting
1.19
penned
1.18
bluntly
1.17
emailed
1.16
angrily
1.15
wrote
1.14
commented
1.13
Activations Density 0.337%