INDEX
    Explanations

    connective words that build lists or sequences

    New Auto-Interp
    Negative Logits
    <unused41>
    -1.59
     myſelf
    -1.58
    <unused43>
    -1.58
    <unused74>
    -1.58
    <unused23>
    -1.57
    <unused14>
    -1.57
    <unused42>
    -1.57
    <unused51>
    -1.57
    [@BOS@]
    -1.57
    <unused8>
    -1.57
    POSITIVE LOGITS
    1.48
    ,
    1.31
    .
    1.30
      
    1.21
    ↵↵
    1.20
    :
    1.19
     (
    1.15
    '
    1.11
        
    1.08
    1.04
    Act Density 1.278%

    No Known Activations