INDEX
Explanations
instances of the word "discuss" and its variations
New Auto-Interp
Negative Logits
-0.76
(
-0.67
↵
-0.66
.
-0.66
,
-0.65
-
-0.65
↵↵
-0.61
'
-0.59
1
-0.57
<eos>
-0.57
POSITIVE LOGITS
<unused43>
1.15
<unused41>
1.13
<pad>
1.13
<unused79>
1.13
<unused23>
1.13
<unused16>
1.13
<unused17>
1.13
<unused14>
1.13
<unused3>
1.13
<unused8>
1.13
Activations Density 0.389%