INDEX
Explanations
expressions of excitement and encouragement
New Auto-Interp
Negative Logits
:↵↵
-0.21
:↵
-0.19
;↵↵
-0.15
:↵↵
-0.15
.↵↵
-0.14
ırak
-0.14
ãĥ³ãĥIJ
-0.14
;↵
-0.13
jas
-0.13
:↵
-0.13
POSITIVE LOGITS
indeed
0.20
glad
0.18
inde
0.17
agree
0.17
Glad
0.17
definitely
0.16
,Yes
0.16
Indeed
0.15
Indeed
0.15
yes
0.15
Activations Density 0.128%