INDEX
Explanations
references to food and dining experiences
New Auto-Interp
Negative Logits
queſta
-1.30
<unused41>
-1.10
<unused28>
-1.09
[@BOS@]
-1.09
<unused8>
-1.09
<unused17>
-1.09
<unused23>
-1.09
<unused16>
-1.09
<pad>
-1.09
<unused14>
-1.09
POSITIVE LOGITS
↵
0.73
.
0.70
<em>
0.69
,
0.67
<strong>
0.66
0.66
1
0.66
*
0.65
2
0.63
<i>
0.63
Activations Density 0.133%