INDEX
    Explanations

    additional, new, or built

    New Auto-Interp
    Negative Logits
     יכול
    0.45
     ہوسک
    0.41
     असू
    0.39
     hiểu
    0.38
     nedir
    0.37
     ہوسکتا
    0.36
     veramente
    0.35
     hona
    0.35
     kebanyakan
    0.34
     maaaring
    0.34
    POSITIVE LOGITS
     يتم
    0.61
    มีการ
    0.54
     we
    0.52
    various
    0.50
     vengono
    0.49
     additional
    0.49
     Additional
    0.48
    additional
    0.48
    建立了
    0.47
    新たに
    0.47
    Act Density 0.057%

    No Known Activations