INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ó
    1.07
    0.93
    nél
    0.91
    imizi
    0.77
    ére
    0.76
    0.75
    and
    0.75
    ává
    0.73
     I
    0.73
    nici
    0.73
    POSITIVE LOGITS
    an
    1.15
    ان
    1.14
    ва
    1.05
     an
    1.01
    ма
    0.93
     он
    0.86
     isn
    0.85
    0.82
    0.82
    that
    0.79
    Act Density 0.027%

    No Known Activations