INDEX
    Explanations

    instances of the word "discuss" and its variations

    New Auto-Interp
    Negative Logits
    -0.76
     (
    -0.67
    -0.66
    .
    -0.66
    ,
    -0.65
    -
    -0.65
    ↵↵
    -0.61
    '
    -0.59
    1
    -0.57
    <eos>
    -0.57
    POSITIVE LOGITS
    <unused43>
    1.15
    <unused41>
    1.13
    <pad>
    1.13
    <unused79>
    1.13
    <unused23>
    1.13
    <unused16>
    1.13
    <unused17>
    1.13
    <unused14>
    1.13
    <unused3>
    1.13
    <unused8>
    1.13
    Act Density 0.389%

    No Known Activations