INDEX
Explanations
phrases indicating understanding or comprehension
expressions of understanding or empathy
New Auto-Interp
Negative Logits
âĢ
-0.97
âĢİ
-0.86
à¨
-0.83
à©
-0.83
æł
-0.81
ãĤ£
-0.79
è»
-0.79
âĢ
-0.79
etheus
-0.76
à¨
-0.76
POSITIVE LOGITS
;)
1.12
haha
1.05
:)
1.03
!?
1.02
:(
1.01
?!
1.01
!
0.99
...?
0.91
dude
0.90
anyways
0.90
Activations Density 0.701%