INDEX
    Explanations

    phrases that convey important actions or decisions impacting groups or communities

    New Auto-Interp
    Negative Logits
    ulant
    -0.16
    ueur
    -0.15
    ิà¸ļ
    -0.14
     Intercept
    -0.14
    gers
    -0.13
    äºİæĺ¯
    -0.13
     nth
    -0.13
    .IContainer
    -0.13
    ifacts
    -0.13
    itorio
    -0.13
    POSITIVE LOGITS
     ihn
    0.20
     Them
    0.17
     sie
    0.16
    sie
    0.16
     THEM
    0.16
     eux
    0.16
    alla
    0.15
    ergus
    0.15
    åħ¶
    0.15
     them
    0.15
    Act Density 0.172%

    No Known Activations