INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     waren
    0.46
     campuran
    0.44
     errori
    0.43
     tenaga
    0.42
     eccles
    0.41
     funkcji
    0.41
    اث
    0.40
     arent
    0.40
     هستند
    0.39
     são
    0.39
    POSITIVE LOGITS
     попыта
    0.52
     independently
    0.47
     попро
    0.44
     confidently
    0.44
     активно
    0.43
     постара
    0.43
     пере
    0.43
     diligently
    0.43
     почув
    0.39
     flexibly
    0.39
    Act Density 0.007%

    No Known Activations