INDEX
    Explanations

    phrases indicating actions or attributes of individuals

    New Auto-Interp
    Negative Logits
     itself
    -0.22
    acher
    -0.16
    abelle
    -0.15
     Stap
    -0.15
    igor
    -0.14
    ubat
    -0.14
    796
    -0.14
    ÃŃl
    -0.14
     themselves
    -0.14
    igi
    -0.13
    POSITIVE LOGITS
    /her
    0.24
     himself
    0.19
    /she
    0.19
    arken
    0.19
    eyse
    0.17
    radan
    0.17
    ulk
    0.15
    iat
    0.15
    MES
    0.15
    upported
    0.15
    Act Density 0.666%

    No Known Activations