INDEX
    Explanations

    phrases indicating inability or difficulties

    New Auto-Interp
    Negative Logits
    jac
    -0.17
     Jou
    -0.16
    ymb
    -0.15
    iner
    -0.15
    azel
    -0.14
    bon
    -0.14
    uel
    -0.14
    aller
    -0.14
    ÑĥмÑĥ
    -0.14
    erm
    -0.14
    POSITIVE LOGITS
     harm
    0.16
    оÑģп
    0.16
    ůst
    0.15
    YLE
    0.14
    adir
    0.14
     Nationwide
    0.14
     harming
    0.14
     elsewhere
    0.14
    ugin
    0.14
     Rolled
    0.13
    Act Density 0.075%

    No Known Activations