INDEX
    Explanations

    personal pronouns and references to individuals

    New Auto-Interp
    Negative Logits
    azas
    -0.54
    RegressionTest
    -0.51
    uable
    -0.49
    Datuak
    -0.48
    lapping
    -0.48
    انجليز
    -0.47
     Stampa
    -0.46
    tainment
    -0.46
     TType
    -0.46
    Potential
    -0.46
    POSITIVE LOGITS
     он
    0.84
     Он
    0.79
     она
    0.71
     Она
    0.68
    Она
    0.68
    Он
    0.68
     оно
    0.66
     Оно
    0.63
     мы
    0.63
     він
    0.61
    Act Density 0.002%

    No Known Activations