INDEX
Explanations
expressions of frustration or disbelief
New Auto-Interp
Negative Logits
Hmm
-0.17
ervers
-0.17
Oops
-0.16
Damn
-0.15
Yep
-0.15
acman
-0.15
Hmm
-0.15
iÅŁte
-0.15
oops
-0.14
Yup
-0.14
POSITIVE LOGITS
seriously
0.35
come
0.28
Wake
0.25
Seriously
0.25
surely
0.24
Seriously
0.24
Come
0.24
serious
0.23
ser
0.23
wake
0.23
Activations Density 0.264%