INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Rhe
    -0.07
     yaşlı
    -0.06
     رف
    -0.06
     проти
    -0.06
     *&
    -0.06
     afforded
    -0.06
    ümü
    -0.06
     afford
    -0.06
    Abb
    -0.06
     oppose
    -0.06
    POSITIVE LOGITS
    (in
    0.11
    in
    0.11
     In
    0.11
    IN
    0.10
     IN
    0.10
    In
    0.10
    	In
    0.09
    _in
    0.08
    .In
    0.08
    /in
    0.08
    Act Density 0.119%

    No Known Activations