INDEX
    Explanations

    numeric values and punctuation marks, particularly those relating to constraints or limits

    New Auto-Interp
    Negative Logits
    -0.61
    -0.60
    ↵↵
    -0.58
    ,
    -0.55
    .
    -0.52
     "
    -0.49
     l
    -0.48
    <eos>
    -0.48
    ↵↵↵
    -0.47
      
    -0.47
    POSITIVE LOGITS
    <unused52>
    2.09
    <unused41>
    2.09
    <pad>
    2.08
    [@BOS@]
    2.08
    <unused17>
    2.08
    <unused23>
    2.08
    <unused16>
    2.08
    <unused14>
    2.08
    <unused3>
    2.08
    <unused8>
    2.08
    Act Density 0.019%

    No Known Activations