INDEX
Explanations
sentences containing a mixture of comments, reactions, and personal reflections
New Auto-Interp
Negative Logits
xtap
-0.68
quartered
-0.66
athered
-0.54
ensibly
-0.54
ocumented
-0.54
translation
-0.54
odied
-0.53
arnaev
-0.52
eatured
-0.52
solete
-0.50
POSITIVE LOGITS
!".
1.33
!"
1.32
;)
1.30
:)
1.27
!!!!!
1.25
..."
1.25
haha
1.25
!'
1.25
anyways
1.24
â̦"
1.23
Activations Density 4.407%