INDEX
    Explanations

    negative contractions and phrases indicating refusal or inability

    New Auto-Interp
    Negative Logits
    923
    -0.18
    taire
    -0.17
    angement
    -0.15
    reet
    -0.15
    ÑĭÑĪ
    -0.14
    _IL
    -0.14
    npos
    -0.14
    atrix
    -0.13
    eren
    -0.13
    NavController
    -0.13
    POSITIVE LOGITS
    chwitz
    0.15
    że
    0.15
    chs
    0.14
    -linear
    0.14
     gent
    0.14
    ate
    0.14
    ORTH
    0.14
    ستÙĩ
    0.13
    chal
    0.13
     be
    0.13
    Act Density 0.059%

    No Known Activations