INDEX
Explanations
references to arms and arm-related concepts
New Auto-Interp
Negative Logits
queſta
-1.07
beſte
-1.05
<unused43>
-1.02
<unused41>
-1.02
<unused74>
-1.02
<unused16>
-1.02
<unused42>
-1.02
<unused47>
-1.02
[@BOS@]
-1.01
<unused3>
-1.01
POSITIVE LOGITS
t
0.58
United
0.53
↵
0.48
s
0.45
united
0.45
was
0.45
Got
0.45
↵↵
0.44
(
0.43
reason
0.43
Activations Density 0.739%