INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     необходимо
    0.41
    享受
    0.39
    MINE
    0.39
    ğimiz
    0.38
    0.38
    0.38
    ريخ
    0.38
     comenzamos
    0.38
    )\|
    0.38
     مني
    0.37
    POSITIVE LOGITS
     am
    0.45
     wondered
    0.45
     wonder
    0.44
     apologize
    0.42
     underestimated
    0.40
     concede
    0.40
     apologise
    0.39
     Kur
    0.39
     memang
    0.38
     myself
    0.38
    Act Density 0.001%

    No Known Activations