INDEX
    Explanations

    references to muscle-related terms and their associated contexts

    New Auto-Interp
    Negative Logits
    ,
    -0.44
    -0.40
     better
    -0.39
    (
    -0.39
    ↵↵
    -0.38
    min
    -0.37
    from
    -0.37
    '
    -0.37
     follow
    -0.36
     generally
    -0.36
    POSITIVE LOGITS
     '\\;'
    1.08
     ſind
    1.05
     queſta
    1.02
    1.00
    <unused43>
    0.98
    <unused14>
    0.98
    <unused74>
    0.97
    <unused41>
    0.97
    <unused80>
    0.97
    [@BOS@]
    0.97
    Act Density 0.209%

    No Known Activations