INDEX
    Explanations

    phrases indicating deception or disguising intentions

    New Auto-Interp
    Negative Logits
    'options
    -0.17
    mort
    -0.15
    omik
    -0.14
    agnostic
    -0.14
    .mods
    -0.14
     ÃĸÄŁren
    -0.14
    ofile
    -0.14
    atoms
    -0.14
    골
    -0.14
    .lift
    -0.14
    POSITIVE LOGITS
     excuses
    0.19
     excuse
    0.18
    icht
    0.17
     justification
    0.17
     justify
    0.17
    ảng
    0.16
    901
    0.16
    arger
    0.15
    ady
    0.15
     claimed
    0.14
    Act Density 0.230%

    No Known Activations