INDEX
    Explanations

    patterns of justification and excusing behavior

    New Auto-Interp
    Negative Logits
    racak
    -0.15
    (strtolower
    -0.15
    HN
    -0.14
    _HAND
    -0.14
    eck
    -0.14
     Stout
    -0.13
     Mundo
    -0.13
    ยม
    -0.13
    stva
    -0.13
    åİ
    -0.13
    POSITIVE LOGITS
     justification
    0.18
    оваÑĢи
    0.17
     justify
    0.16
    طاÙĤ
    0.15
    ÑĦи
    0.15
    orney
    0.14
    away
    0.14
     foreign
    0.14
     why
    0.14
    quals
    0.14
    Act Density 0.195%

    No Known Activations