INDEX
Explanations
phrases related to highlighting or emphasizing certain points
expressions of uncertainty or anticipation about future events
New Auto-Interp
Negative Logits
ãĥ¼ãĥĨãĤ£
-0.71
umbn
-0.70
elfth
-0.69
andise
-0.64
Instr
-0.63
gerald
-0.62
ij士
-0.60
appropriately
-0.57
ãĥĹ
-0.56
orough
-0.55
POSITIVE LOGITS
)?
0.95
¶
0.90
;)
0.83
;
0.79
:)
0.78
!!!!
0.77
!!
0.77
:-)
0.76
?),
0.75
):
0.75
Activations Density 0.930%