INDEX
Explanations
mentions of the color white and related concepts
New Auto-Interp
Negative Logits
queſta
-2.50
<unused43>
-2.45
<unused41>
-2.44
<unused23>
-2.44
<unused74>
-2.44
<unused42>
-2.44
[@BOS@]
-2.44
<unused14>
-2.42
<unused8>
-2.42
<unused3>
-2.42
POSITIVE LOGITS
1.68
↵
1.52
,
1.48
↵↵
1.43
-
1.42
.
1.41
(
1.37
y
1.34
1.30
I
1.30
Activations Density 2.546%