INDEX
    Explanations

    strong, critical adjectives or phrases

    emotionally charged adjectives that convey strong criticism or commentary

    New Auto-Interp
    Negative Logits
    hops
    -0.95
    qqa
    -0.81
    gat
    -0.78
    slave
    -0.77
    cells
    -0.76
    ween
    -0.76
    chwitz
    -0.73
    plane
    -0.73
    Jump
    -0.71
     Stores
    -0.70
    POSITIVE LOGITS
     commentary
    1.29
     rebuke
    1.23
     indictment
    1.23
     critique
    1.20
     rebutt
    1.19
     understatement
    1.16
     remarks
    1.14
     statement
    1.14
     conclusion
    1.11
     explanation
    1.08
    Act Density 0.233%

    No Known Activations