INDEX
    Explanations

    references to leftist ideologies and movements

    New Auto-Interp
    Negative Logits
    ously
    -0.18
    egin
    -0.17
    rna
    -0.17
    ptions
    -0.16
    erate
    -0.15
    aeper
    -0.15
    rung
    -0.15
    risk
    -0.14
    edin
    -0.14
    itag
    -0.14
    POSITIVE LOGITS
    ward
    0.25
    wards
    0.22
    -hand
    0.22
    ness
    0.20
    /right
    0.20
    most
    0.18
    ened
    0.18
    ت
    0.18
    -wing
    0.17
    eous
    0.17
    Act Density 0.034%

    No Known Activations