INDEX
Explanations
colloquial, informal communication markers like smileys and abbreviations
New Auto-Interp
Negative Logits
recip
-0.73
ilater
-0.68
upstairs
-0.66
rex
-0.64
iqueness
-0.64
carp
-0.64
tremend
-0.63
outwe
-0.62
sburg
-0.61
rele
-0.61
POSITIVE LOGITS
Latest
0.94
Ahead
0.94
Yesterday
0.94
Thousands
0.93
Dozens
0.90
Earlier
0.89
Hundreds
0.86
Riding
0.85
Researchers
0.84
Actor
0.84
Activations Density 0.035%