INDEX
Explanations
phrases indicating a certainty or conclusive statement
phrases indicating uncertainty or questioning
New Auto-Interp
Negative Logits
iliated
-0.66
ourses
-0.63
seiz
-0.63
confir
-0.62
ourse
-0.61
pread
-0.61
tremend
-0.60
undermin
-0.59
etheus
-0.58
illary
-0.57
POSITIVE LOGITS
;)
0.95
ðŁĺ
0.79
:)
0.78
:-)
0.78
haha
0.78
â̦)
0.75
ðŁĻĤ
0.75
â̦.
0.73
?!
0.71
!
0.71
Activations Density 0.231%