INDEX
    Explanations

    phrases that indicate refusal or rejection

    New Auto-Interp
    Negative Logits
     interchange
    -0.54
     accent
    -0.52
     crim
    -0.50
     cheers
    -0.50
    èĢħ
    -0.49
     accompanying
    -0.48
     corro
    -0.47
     deterior
    -0.47
     congratulate
    -0.45
     lia
    -0.45
    POSITIVE LOGITS
    ·
    0.63
    ©
    0.61
    ĩ
    0.59
    ī
    0.59
    ħ
    0.56
    orial
    0.55
    IJ
    0.54
    ĵĺ
    0.54
    °
    0.53
    ®
    0.53
    Act Density 2.620%

    No Known Activations