INDEX
    Explanations

    references to cheating or dishonesty

    New Auto-Interp
    Negative Logits
     Sten
    -0.76
    uParam
    -0.73
     Ví
    -0.69
    àn
    -0.69
     Objekte
    -0.69
     Masson
    -0.68
    komo
    -0.68
     Dele
    -0.68
     objects
    -0.66
     Objek
    -0.66
    POSITIVE LOGITS
     Che
    1.51
    Che
    1.42
     CHE
    1.38
     che
    1.37
     cheetah
    1.27
     Cheyenne
    1.23
     cheating
    1.22
    CHE
    1.20
     Chel
    1.15
    che
    1.11
    Act Density 0.015%

    No Known Activations