INDEX
    Explanations

    references to political affiliations or actions related to defiance against authority

    New Auto-Interp
    Negative Logits
     CWE
    -0.79
     gainera
    -0.71
     Wicidata
    -0.71
     cherchés
    -0.67
     disambiguazione
    -0.66
     Италијани
    -0.63
    EndGlobalSection
    -0.63
    كويكب
    -0.62
    Vidite
    -0.59
     oprot
    -0.54
    POSITIVE LOGITS
     refusal
    2.05
     refused
    1.90
     refus
    1.89
     refuse
    1.85
     refusing
    1.79
     refuses
    1.72
     rejection
    1.69
     reject
    1.63
     rejecting
    1.62
     refuser
    1.61
    Act Density 0.746%

    No Known Activations