INDEX
    Explanations

    instances of politically charged statements or events

    New Auto-Interp
    Negative Logits
     according
    -0.07
    çı
    -0.06
    OPS
    -0.06
    ÙģÙĩ
    -0.06
    ÐŀÐł
    -0.06
    ña
    -0.06
    according
    -0.06
    iper
    -0.06
    WORD
    -0.06
    ellas
    -0.06
    POSITIVE LOGITS
    esktop
    0.07
    /REC
    0.06
    andom
    0.06
    /my
    0.06
     no
    0.06
     kata
    0.06
    urer
    0.06
     rám
    0.06
    yles
    0.06
     don
    0.06
    Act Density 0.012%

    No Known Activations