INDEX
    Explanations

    apologies or expressions of regret

    New Auto-Interp
    Negative Logits
    alo
    -0.15
    imary
    -0.15
     possibilities
    -0.14
    ini
    -0.14
    šet
    -0.14
    imen
    -0.14
    irk
    -0.14
    alli
    -0.14
    QUIT
    -0.14
     jeopardy
    -0.13
    POSITIVE LOGITS
     couldn
    0.18
    /not
    0.17
    kus
    0.17
     bout
    0.16
    813
    0.16
     meant
    0.16
    couldn
    0.15
    bout
    0.15
     Mods
    0.15
    ably
    0.15
    Act Density 0.030%

    No Known Activations