INDEX
    Explanations

    phrases related to refusals and offers of assistance

    New Auto-Interp
    Negative Logits
    oste
    -0.16
     succesfully
    -0.16
    abilidad
    -0.15
    avia
    -0.15
    ||(
    -0.15
    omaly
    -0.15
    erca
    -0.15
    ãĤ¸ãĤ¢
    -0.15
     lesbienne
    -0.14
    lexport
    -0.14
    POSITIVE LOGITS
     let
    0.30
     accept
    0.29
     accepting
    0.28
     allow
    0.27
     letting
    0.26
     accepts
    0.26
     Accept
    0.25
    accept
    0.24
     allowing
    0.23
    let
    0.22
    Act Density 0.136%

    No Known Activations