INDEX
    Explanations

    references to deception or pretense

    New Auto-Interp
    Negative Logits
    /fw
    -0.16
    DialogTitle
    -0.15
    .dateTime
    -0.14
    ronym
    -0.14
    ела
    -0.14
    udes
    -0.14
    fw
    -0.14
    ongan
    -0.14
    иÑģÑģ
    -0.14
    «ĺ
    -0.14
    POSITIVE LOGITS
    ment
    0.17
    aly
    0.17
    rosse
    0.16
    endi
    0.15
     dụ
    0.15
     motivational
    0.14
     cover
    0.14
    enty
    0.14
     Cover
    0.14
     Urban
    0.14
    Act Density 0.112%

    No Known Activations