INDEX
    Explanations

    language reflecting strong negative emotions, particularly hate, as well as references to specific segments or categories

    New Auto-Interp
    Negative Logits
    seen
    -0.39
    EDEFAULT
    -0.36
     Crock
    -0.35
    commission
    -0.35
     crock
    -0.35
    opposition
    -0.34
    UserScript
    -0.34
    don
    -0.34
     doz
    -0.33
     sesama
    -0.33
    POSITIVE LOGITS
    Segment
    0.73
     Segment
    0.69
     threshold
    0.66
    HtmlAttribute
    0.65
     hate
    0.65
     CreateTagHelper
    0.62
     seuil
    0.61
     HATE
    0.60
     Threshold
    0.60
     segment
    0.59
    Act Density 0.074%

    No Known Activations