INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Phần
    0.41
     식으로
    0.40
     منابع
    0.38
    හාර
    0.36
     ойно
    0.36
     ډول
    0.36
    izowane
    0.36
     наличи
    0.36
    0.36
    ارية
    0.35
    POSITIVE LOGITS
     activities
    0.73
     actions
    0.69
     something
    0.66
    activities
    0.64
     acts
    0.62
     atividades
    0.61
    Activities
    0.60
    Actions
    0.60
     actividades
    0.60
     действия
    0.58
    Act Density 0.021%

    No Known Activations