INDEX
    Explanations

    articles and identifying word types

    New Auto-Interp
    Negative Logits
    ע
    0.44
    니다
    0.44
    აპ
    0.44
    А
    0.43
    ಮನ
    0.42
    MARK
    0.42
    Prot
    0.41
    פ
    0.41
    UNKNOWN
    0.41
     pręd
    0.40
    POSITIVE LOGITS
     использованием
    0.48
     algumas
    0.46
     folos
    0.41
     dishes
    0.41
     alguns
    0.41
     situa
    0.41
     Reihe
    0.41
     kasus
    0.41
     algunas
    0.40
     dépour
    0.40
    Act Density 0.000%

    No Known Activations