INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    𒋗
    0.39
     necessário
    0.36
     परहेज
    0.36
     berharap
    0.36
     مدیریت
    0.35
    ʖ
    0.34
    siniz
    0.33
     قیادت
    0.33
    0.33
     İstifadə
    0.33
    POSITIVE LOGITS
    ,
    0.66
    ،
    0.59
    0.55
     ,
    0.46
    0.46
    0.45
     ،
    0.38
     strikingly
    0.38
    0.37
     indeed
    0.37
    Act Density 0.101%

    No Known Activations