INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    iore
    -0.07
     сделать
    -0.06
    riterion
    -0.06
    чески
    -0.06
    pace
    -0.06
    rong
    -0.06
     прям
    -0.06
     قض
    -0.06
     Hotels
    -0.06
     h
    -0.06
    POSITIVE LOGITS
     despre
    0.07
    FM
    0.07
    (skb
    0.07
    /we
    0.06
    _WATER
    0.06
     نامه
    0.06
    fre
    0.06
     BUFF
    0.06
    0.06
    (pp
    0.06
    Act Density 0.003%

    No Known Activations