INDEX
    Explanations

    requests for assistance or guidance in achieving specific tasks

    New Auto-Interp
    Negative Logits
    :✨
    -0.80
    <unused68>
    -0.77
    <unused14>
    -0.77
    <unused74>
    -0.76
    <unused41>
    -0.76
    <unused8>
    -0.76
    <unused3>
    -0.76
    <unused16>
    -0.76
    [@BOS@]
    -0.76
    <pad>
    -0.76
    POSITIVE LOGITS
     is
    0.57
     includes
    0.28
     Allerdings
    0.28
     yakni
    0.28
    namely
    0.28
     yaitu
    0.28
     are
    0.28
     involves
    0.27
     appears
    0.27
     consists
    0.27
    Act Density 0.083%

    No Known Activations