INDEX
Explanations
expressions of surprise or disbelief
New Auto-Interp
Negative Logits
aurus
-0.16
ption
-0.15
apon
-0.15
stown
-0.14
ug
-0.14
encer
-0.14
ега
-0.14
atis
-0.14
<IM
-0.14
essim
-0.13
POSITIVE LOGITS
Oh
0.17
yes
0.16
gross
0.15
338
0.15
dre
0.14
Spear
0.14
Hutch
0.14
èª
0.14
gross
0.14
Gross
0.14
Activations Density 0.028%