INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     heal
    0.43
     JM
    0.42
     reinforce
    0.41
     irony
    0.40
     impulse
    0.40
     ảo
    0.40
     verify
    0.40
    FP
    0.40
    0.40
     broaden
    0.39
    POSITIVE LOGITS
    ײ
    0.45
    ophila
    0.44
    inės
    0.42
     المقرر
    0.40
    ării
    0.39
    سكر
    0.39
     Wag
    0.39
     Notices
    0.39
    ssss
    0.38
    MeToo
    0.38
    Act Density 0.026%

    No Known Activations