INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (
    0.76
    يد
    0.70
    ER
    0.66
    UM
    0.66
    AT
    0.65
    EX
    0.65
     premiere
    0.61
    েন
    0.61
    ньше
    0.61
    Aqui
    0.61
    POSITIVE LOGITS
     tật
    0.73
    o
    0.73
    0.73
    0.71
    definitions
    0.70
    0.70
    factors
    0.70
    рка
    0.67
     제작
    0.67
     دھو
    0.66
    Act Density 0.004%

    No Known Activations