INDEX
Explanations
instances of direct speech or dialogue
New Auto-Interp
Negative Logits
[@BOS@]
-0.83
<unused23>
-0.83
<unused8>
-0.82
<unused14>
-0.82
<unused51>
-0.82
<unused68>
-0.82
<unused47>
-0.82
<unused42>
-0.82
<unused28>
-0.82
<unused41>
-0.82
POSITIVE LOGITS
'
0.35
Is
0.31
What
0.31
1
0.30
2
0.28
I
0.28
SP
0.27
You
0.27
、
0.27
If
0.27
Activations Density 0.023%