INDEX
    Explanations

    profanity and offensive language

    affirmations or denials within discussion contexts

    New Auto-Interp
    Negative Logits
     curfew
    -0.35
     dams
    -0.32
     earthquakes
    -0.31
     genomes
    -0.31
     bases
    -0.30
     reactors
    -0.30
     pores
    -0.30
     polygamy
    -0.29
     jails
    -0.29
     roofs
    -0.28
    POSITIVE LOGITS
    orp
    0.35
    perty
    0.35
    icably
    0.35
    Ax
    0.35
    uine
    0.33
    orne
    0.33
    omew
    0.33
     Helpful
    0.32
    REDACTED
    0.32
    ohn
    0.32
    Act Density 2.058%

    No Known Activations