INDEX
    Explanations

    strong statements opposing violence and advocating for human rights

    New Auto-Interp
    Negative Logits
    inding
    -0.14
     quadr
    -0.14
     really
    -0.14
     æĿİ
    -0.14
    inda
    -0.14
     exactly
    -0.14
     hardly
    -0.14
    498
    -0.13
    uben
    -0.13
     Tap
    -0.13
    POSITIVE LOGITS
     tolerate
    0.25
     toler
    0.24
     acceptable
    0.24
     tolerated
    0.23
    _tolerance
    0.22
     tolerance
    0.21
    Accept
    0.21
    tol
    0.20
     Accept
    0.20
     ACCEPT
    0.20
    Act Density 0.208%

    No Known Activations