INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ftant
    -0.84
     يتيمه
    -0.84
    )•
    -0.76
    felves
    -0.76
    ftances
    -0.73
    esterday
    -0.72
    $")
    -0.72
    ?")
    -0.69
    "):
    
    -0.69
    chyma
    -0.69
    POSITIVE LOGITS
    .
    0.55
    ,
    0.51
     and
    0.48
    <bos>
    0.48
    '
    0.47
    ;
    0.46
    (
    0.46
    Życiorys
    0.46
    s
    0.45
     استنادى
    0.44
    Act Density 0.114%

    No Known Activations