INDEX
    Explanations

    pronouns and possessive forms related to individuals

    New Auto-Interp
    Negative Logits
    $MESS
    -0.17
    ncia
    -0.15
    .googleapis
    -0.15
    FUL
    -0.15
    pis
    -0.15
    $LANG
    -0.14
    iciel
    -0.14
    ÑĤÑı
    -0.14
     Coff
    -0.14
    isÃŃ
    -0.14
    POSITIVE LOGITS
    /her
    0.25
    /she
    0.22
     or
    0.21
    idi
    0.16
    avo
    0.14
    atrix
    0.14
    ik
    0.14
    erif
    0.14
    /h
    0.14
    enberg
    0.14
    Act Density 0.112%

    No Known Activations