INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     GPS
    -0.07
     yerde
    -0.07
     ש
    -0.07
    victim
    -0.06
    编号
    -0.06
    United
    -0.06
    олет
    -0.06
    Matches
    -0.06
     quad
    -0.06
    $res
    -0.06
    POSITIVE LOGITS
     takeaway
    0.10
     ***!↵
    0.07
    0.07
     withholding
    0.06
    /comments
    0.06
     trolls
    0.06
    .opensource
    0.06
     adv
    0.06
     wiki
    0.06
    adding
    0.06
    Act Density 0.004%

    No Known Activations