INDEX
    Explanations

    conjunctions indicating purpose or reason

    New Auto-Interp
    Negative Logits
    rag
    -0.17
    åĻ
    -0.15
    iversit
    -0.15
    prising
    -0.14
    amon
    -0.14
     Uhr
    -0.14
    zt
    -0.14
    ige
    -0.14
    ίÏīν
    -0.14
    kelig
    -0.14
    POSITIVE LOGITS
     that
    0.20
    that
    0.19
    ìį¨
    0.17
     rằng
    0.17
    Łèĥ½
    0.16
    ovice
    0.15
     forth
    0.15
    aps
    0.15
     että
    0.15
    425
    0.14
    Act Density 0.053%

    No Known Activations