INDEX
    Explanations

    the word "but" in various contexts

    New Auto-Interp
    Negative Logits
    osu
    -0.16
    سÙĥ
    -0.14
    uire
    -0.14
    mour
    -0.14
    iami
    -0.13
    uces
    -0.13
    ença
    -0.13
     Trick
    -0.13
    oldem
    -0.13
    uxe
    -0.13
    POSITIVE LOGITS
    /or
    0.16
    ommen
    0.14
    anes
    0.14
    ect
    0.14
    tems
    0.14
    irl
    0.14
    /OR
    0.14
    ugg
    0.13
    chers
    0.13
    ernel
    0.13
    Act Density 0.072%

    No Known Activations