INDEX
    Explanations

    instances of refusal or negation actions

    New Auto-Interp
    Negative Logits
    setVerticalGroup
    -0.89
     iprot
    -0.84
     CreateTagHelper
    -0.69
    hoeddwyd
    -0.66
    发表于
    -0.61
    LookAnd
    -0.61
     silly
    -0.59
    fjspx
    -0.58
     giggle
    -0.58
     елның
    -0.57
    POSITIVE LOGITS
     refuse
    1.21
     refused
    1.20
     refuses
    1.18
     refusal
    1.15
     Refuse
    1.12
     refusing
    1.12
     hesitate
    0.98
     refus
    0.80
     refuser
    0.79
     hesitation
    0.77
    Act Density 0.084%

    No Known Activations