INDEX
    Explanations

    references to cheating or infidelity

    New Auto-Interp
    Negative Logits
    loo
    -0.18
    æ³Ĭ
    -0.17
    bserv
    -0.17
    лек
    -0.16
    arkan
    -0.15
    ÑĢави
    -0.14
    ÑĢин
    -0.14
    SION
    -0.14
    оваÑĢи
    -0.14
    ãĥ¥
    -0.14
    POSITIVE LOGITS
     Che
    0.26
     che
    0.25
    -che
    0.25
    vron
    0.24
    Che
    0.23
    vrolet
    0.23
    ating
    0.20
    apest
    0.20
    aper
    0.20
    CHE
    0.20
    Act Density 0.008%

    No Known Activations