INDEX
Explanations
references to muscle-related terms and their associated contexts
New Auto-Interp
Negative Logits
,
-0.44
↵
-0.40
better
-0.39
(
-0.39
↵↵
-0.38
min
-0.37
from
-0.37
'
-0.37
follow
-0.36
generally
-0.36
POSITIVE LOGITS
'\\;'
1.08
ſind
1.05
queſta
1.02
1.00
<unused43>
0.98
<unused14>
0.98
<unused74>
0.97
<unused41>
0.97
<unused80>
0.97
[@BOS@]
0.97
Activations Density 0.209%