INDEX
Explanations
punctuation marks indicating a question, excitement, or an ending
punctuation marks indicating rhetorical questions or exclamations
New Auto-Interp
Negative Logits
ibles
-0.75
anniversary
-0.70
fid
-0.66
clus
-0.64
bisc
-0.63
administrator
-0.62
izons
-0.62
elig
-0.61
aband
-0.61
scrim
-0.61
POSITIVE LOGITS
Anyway
1.22
Similarly
0.99
Conversely
0.97
Nevertheless
0.95
Interestingly
0.91
Likewise
0.91
Nonetheless
0.90
Therefore
0.89
Anyway
0.89
Finally
0.89
Activations Density 0.116%