INDEX
    Explanations

    mentions of "Che" or relevant terms associated with cheating behavior

    New Auto-Interp
    Negative Logits
    loo
    -0.19
    ners
    -0.18
    alez
    -0.16
    ouro
    -0.16
    SION
    -0.15
    ra
    -0.15
    arkan
    -0.15
    bserv
    -0.15
    neo
    -0.15
    ãĥ¼ãĥį
    -0.15
    POSITIVE LOGITS
    vron
    0.23
    vrolet
    0.20
     Che
    0.20
    -che
    0.19
    erokee
    0.18
    aper
    0.17
     che
    0.17
    ating
    0.17
    pch
    0.17
    apest
    0.16
    Act Density 0.010%

    No Known Activations