INDEX
    Explanations

    mentions of non-violence, social activism or resistance

    terms associated with violence and its non-violent alternatives

    New Auto-Interp
    Negative Logits
    ibal
    -0.76
    rans
    -0.64
    verages
    -0.64
    saw
    -0.63
    nan
    -0.62
     Haf
    -0.61
    abase
    -0.61
    older
    -0.61
     Lumpur
    -0.61
    ween
    -0.61
    POSITIVE LOGITS
    theless
    0.85
    iferation
    0.76
    istance
    0.75
    DragonMagazine
    0.72
    iterranean
    0.69
    anmar
    0.64
    istant
    0.63
    ances
    0.61
    ensical
    0.61
    otine
    0.61
    Act Density 0.055%

    No Known Activations