INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ،
    0.47
     operatives
    0.45
     estab
    0.44
     mensen
    0.44
     ->
    0.43
     utilizzare
    0.42
     aplik
    0.42
     mores
    0.42
    ד
    0.42
     scientist
    0.41
    POSITIVE LOGITS
     трудно
    0.56
    0.44
    АТ
    0.43
    PE
    0.40
    vendo
    0.40
     प्रतीत
    0.40
    构建
    0.39
    ígono
    0.39
    စိတ်အပိုင်း
    0.39
    پيديا
    0.38
    Act Density 0.001%

    No Known Activations