INDEX
Explanations
conditional phrases and expressions of uncertainty
after "is" or forms of "to be"
learned aboutbecome moretoo conservativesimple fact
New Auto-Interp
Negative Logits
↵
-0.35
statt
-0.34
down
-0.31
1
-0.30
N
-0.30
"
-0.30
“
-0.29
|
-0.29
IN
-0.28
s
-0.28
POSITIVE LOGITS
<unused41>
1.03
[@BOS@]
1.03
<unused3>
1.02
<unused8>
1.02
<unused43>
1.02
<unused51>
1.02
<unused42>
1.02
<unused28>
1.02
<unused14>
1.02
<unused16>
1.02
Activations Density 0.750%