INDEX
    Explanations

    expressions of giving up or leaving situations

    New Auto-Interp
    Negative Logits
    awy
    -0.15
    iyel
    -0.14
    berger
    -0.14
    rette
    -0.14
    itra
    -0.13
     Katz
    -0.13
    pieces
    -0.13
    ura
    -0.13
    ihan
    -0.13
    osome
    -0.13
    POSITIVE LOGITS
     nor
    0.25
    indre
    0.18
    Nor
    0.16
     ani
    0.16
     Nor
    0.16
     slightest
    0.15
    nor
    0.15
    evi
    0.15
    bs
    0.15
    adir
    0.15
    Act Density 0.211%

    No Known Activations