INDEX
    Explanations

    references to people or pronouns in various languages

    New Auto-Interp
    Negative Logits
    EndInit
    -0.83
    таратура
    -0.79
    esgue
    -0.78
     Wikimédia
    -0.76
     '{@
    -0.72
    بوابة
    -0.72
    Välislingid
    -0.71
    AnchorStyles
    -0.69
    tvr
    -0.68
    DrawerToggle
    -0.66
    POSITIVE LOGITS
     Он
    1.42
     он
    1.42
    Он
    1.36
     Оно
    1.23
     ela
    1.17
     она
    1.17
     оно
    1.15
     eles
    1.13
     ele
    1.10
     ellos
    1.06
    Act Density 0.034%

    No Known Activations