INDEX
    Explanations

    ending sentences with a specific word

    New Auto-Interp
    Negative Logits
     touted
    0.36
     supportive
    0.33
     twor
    0.33
     competes
    0.32
    support
    0.31
     späteren
    0.31
     athletics
    0.30
     nort
    0.29
    hardware
    0.29
     support
    0.29
    POSITIVE LOGITS
     DEATH
    0.36
    𝗢
    0.35
     바로
    0.34
    ANDO
    0.33
    !”.
    0.32
     časti
    0.32
     사람
    0.32
    0.32
     கொடு
    0.32
    0.31
    Act Density 0.002%

    No Known Activations