INDEX
Explanations
emojis and special characters
emojis and emoticons
New Auto-Interp
Negative Logits
evidence
-0.63
Åį
-0.62
wages
-0.59
outh
-0.57
substance
-0.56
âĢij
-0.56
ousing
-0.56
acute
-0.56
Eight
-0.56
advanced
-0.55
POSITIVE LOGITS
;)
3.39
:)
3.37
ðŁĻĤ
3.31
:-)
3.16
ðŁĺ
3.01
haha
2.34
:(
2.17
lol
1.91
XD
1.71
LOL
1.67
Activations Density 0.025%