INDEX
    Explanations

    instances of profane language or derogatory language

    New Auto-Interp
    Negative Logits
    HCR
    -1.18
    fman
    -1.11
    gary
    -0.97
    ================================
    -0.96
     Expend
    -0.96
    ocamp
    -0.96
    NetMessage
    -0.96
    ervation
    -0.92
    CVE
    -0.90
    AUT
    -0.90
    POSITIVE LOGITS
    bags
    1.27
    posts
    1.20
    storm
    1.15
    loads
    1.14
     detector
    1.14
    heads
    1.13
     detectors
    1.10
    faced
    1.08
    lords
    1.07
    lord
    1.06
    Act Density 0.714%

    No Known Activations