INDEX
    Explanations

    potential conditionality or uncertainty in statements

    New Auto-Interp
    Negative Logits
    ledo
    -0.15
    inia
    -0.15
    ariate
    -0.14
    presso
    -0.14
    oster
    -0.14
    p
    -0.14
    uin
    -0.13
    iyat
    -0.13
    elia
    -0.13
    ujet
    -0.13
    POSITIVE LOGITS
    hem
    0.20
    nard
    0.19
    jÃŃm
    0.17
    /all
    0.17
    ÏĮÏģ
    0.17
    onna
    0.16
     saja
    0.16
    íģ¼
    0.16
     be
    0.15
    ones
    0.15
    Act Density 0.089%

    No Known Activations