INDEX
    Explanations

    references to causality and violation of rules or expectations

    Follows prepositions or punctuation

    New Auto-Interp
    Negative Logits
     يتيمه
    -1.03
    }$​
    -0.84
     Normdatei
    -0.82
     تانيه
    -0.79
     estekak
    -0.72
     tfsi
    -0.72
    tserrat
    -0.71
    manjaro
    -0.70
     cdti
    -0.68
    êques
    -0.68
    POSITIVE LOGITS
     her
    1.31
     she
    1.15
     his
    1.13
     him
    1.01
     he
    0.99
     their
    0.92
     they
    0.83
    She
    0.83
     He
    0.81
    she
    0.79
    Act Density 3.943%

    No Known Activations