INDEX
    Explanations

    possessive pronouns or possessive forms indicating ownership or relationships

    New Auto-Interp
    Negative Logits
     the
    -0.06
    stown
    -0.06
    azz
    -0.06
    äft
    -0.06
    302
    -0.05
     your
    -0.05
    640
    -0.05
     Nich
    -0.05
     
    -0.05
    baugh
    -0.05
    POSITIVE LOGITS
    ì§ĢëıĦ
    0.09
     tô
    0.09
    باش
    0.08
     jich
    0.08
    ãĥĨãĥ«
    0.08
    ÙĬدا
    0.08
    [`
    0.08
    jad
    0.08
    IRQ
    0.08
     lesbi
    0.08
    Act Density 0.035%

    No Known Activations