INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ка
    0.84
     to
    0.68
    ш
    0.68
    ו
    0.68
    ك
    0.64
    verfahren
    0.63
    0.63
    َ
    0.62
    ра
    0.61
    ş
    0.61
    POSITIVE LOGITS
    h
    1.14
    i
    1.06
    r
    1.02
    a
    1.00
    y
    0.97
     aument
    0.89
    s
    0.88
    p
    0.87
     uomini
    0.86
    k
    0.86
    Act Density 0.042%

    No Known Activations