INDEX
    Explanations

    label categories and named entities

    New Auto-Interp
    Negative Logits
    op
    1.60
    1.49
    є
    1.44
    1.44
    ad
    1.43
    ge
    1.42
    то
    1.41
     vistos
    1.41
    你不
    1.38
     affiche
    1.37
    POSITIVE LOGITS
    дная
    1.65
    ிய
    1.61
    ❤️❤️
    1.57
    Czas
    1.53
    gF
    1.53
    1.51
    bhar
    1.49
    1.46
    версите
    1.46
    1.43
    Act Density 0.178%

    No Known Activations