INDEX
Explanations
phrases related to disbelief or astonishment
expressions of disbelief or questioning reality
New Auto-Interp
Negative Logits
kefeller
-0.69
Lic
-0.64
athered
-0.63
ourses
-0.61
ithe
-0.61
odox
-0.59
marg
-0.59
haps
-0.59
agonists
-0.59
umbnails
-0.58
POSITIVE LOGITS
!"
1.10
!!!!!
1.09
haha
1.05
:)
1.05
!!"
1.05
;)
1.01
!".
0.98
!'
0.97
!!!!
0.97
?!"
0.96
Activations Density 0.609%