INDEX
    Explanations

    instances of resistance or opposition in various contexts

    New Auto-Interp
    Negative Logits
    alous
    -0.17
    ntag
    -0.17
    efully
    -0.15
    rell
    -0.14
    uncio
    -0.14
    ittal
    -0.14
    ạn
    -0.14
    éĿ©
    -0.14
    esture
    -0.13
    ırak
    -0.13
    POSITIVE LOGITS
     back
    0.56
    back
    0.53
    -back
    0.43
    Back
    0.41
    _back
    0.40
     BACK
    0.39
    .back
    0.39
     Back
    0.38
    BACK
    0.34
     zurück
    0.34
    Act Density 0.019%

    No Known Activations