INDEX
    Explanations

    phrases indicating refusal or rejection of actions

    New Auto-Interp
    Negative Logits
    xbf
    -0.17
    alli
    -0.15
    uppe
    -0.15
    riba
    -0.14
    mutable
    -0.14
    mrt
    -0.14
    olic
    -0.14
    uala
    -0.14
     réuss
    -0.13
    apur
    -0.13
    POSITIVE LOGITS
     accept
    0.29
     accepting
    0.28
    accept
    0.27
     accepts
    0.26
     Accept
    0.25
    Accept
    0.23
    _accept
    0.22
     allow
    0.21
     acept
    0.21
     acceptance
    0.21
    Act Density 0.127%

    No Known Activations