INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ב
    1.74
    д
    1.63
    ó
    1.62
    ле
    1.60
    er
    1.49
    в
    1.45
    á
    1.38
    1.38
    1.38
    1.34
    POSITIVE LOGITS
    Men
    1.10
    tiin
    0.98
    uvial
    0.96
    ydi
    0.96
     Men
    0.95
     
    0.94
    I
    0.91
    nych
    0.91
    ya
    0.89
    men
    0.89
    Act Density 0.007%

    No Known Activations