INDEX
    Explanations

    phrases expressing the absence or lack of something

    New Auto-Interp
    Negative Logits
     Yet
    -0.23
    Yet
    -0.20
    yet
    -0.19
     yet
    -0.18
     Still
    -0.17
    dle
    -0.16
    ugins
    -0.16
    illo
    -0.15
     ancora
    -0.15
    lix
    -0.15
    POSITIVE LOGITS
     but
    0.25
     than
    0.24
     short
    0.23
     other
    0.22
     less
    0.22
     hơn
    0.19
    burg
    0.19
     OTHER
    0.18
     BUT
    0.18
    but
    0.17
    Act Density 0.035%

    No Known Activations