INDEX
    Explanations

    First-person pronouns

    New Auto-Interp
    Negative Logits
     منها
    -0.07
    232
    -0.07
     callback
    -0.06
    Io
    -0.06
     Kom
    -0.06
     Hiện
    -0.06
    002
    -0.06
     Gaw
    -0.06
     desar
    -0.06
     jaký
    -0.06
    POSITIVE LOGITS
     demolished
    0.07
     realise
    0.07
     обеспечива
    0.06
     mrb
    0.06
     tasked
    0.06
     freel
    0.06
     noi
    0.06
     فرمود
    0.06
     Прот
    0.06
    0.06
    Act Density 0.026%

    No Known Activations