INDEX
    Explanations

    phrases indicating refusal or resistance to comply with requests or commands

    New Auto-Interp
    Negative Logits
    .
    -1.17
    ”.
    -0.75
    ".
    -0.69
    .”
    -0.63
    ).
    -0.63
    :
    -0.63
    ’.
    -0.63
    '.
    -0.60
     –
    -0.58
    !
    -0.56
    POSITIVE LOGITS
     الرياضيه
    1.27
     Мексичка
    1.21
     kasarigan
    1.18
    WriteBarrier
    1.11
     كومونز
    1.07
    __':
    
    1.06
    ^(@)
    1.05
    )*/
    1.04
    StoryboardSegue
    1.01
     Efq
    1.00
    Act Density 0.429%

    No Known Activations