INDEX
    Explanations

    references to plants and leadership

    New Auto-Interp
    Negative Logits
    <unused43>
    -1.09
    <unused41>
    -1.09
    <unused74>
    -1.09
    <unused3>
    -1.08
    <unused14>
    -1.08
    [@BOS@]
    -1.08
    <unused23>
    -1.08
    <unused42>
    -1.08
    <unused17>
    -1.08
    <pad>
    -1.07
    POSITIVE LOGITS
    ,
    0.85
    0.83
    ↵↵
    0.82
    0.82
    .
    0.75
    1
    0.73
     system
    0.72
    2
    0.72
    <eos>
    0.70
     (
    0.70
    Act Density 0.426%

    No Known Activations