INDEX
Explanations
punctuation marks and expressions of enthusiasm or surprise
New Auto-Interp
Negative Logits
ndef
-0.17
.TestCase
-0.16
strup
-0.16
iga
-0.15
ivre
-0.15
ypse
-0.15
vos
-0.15
ãĤ¤ãĤº
-0.14
sw
-0.14
elles
-0.14
POSITIVE LOGITS
tweeted
0.23
tweets
0.23
Tweets
0.22
.@
0.22
RT
0.21
tweet
0.21
@nate
0.20
0.20
retweeted
0.19
twe
0.18
Activations Density 0.027%