INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ur
    1.63
     
    1.55
    ?
    1.19
    6
    1.18
    ud
    1.10
    rom
    1.08
    ak
    1.06
    1.06
    ys
    1.05
    -
    1.05
    POSITIVE LOGITS
    да
    1.25
    した
    1.23
    miş
    1.20
    دي
    1.19
    ِينَ
    1.16
    1.10
    1.09
     humains
    1.08
    lerden
    1.07
    ்களை
    1.07
    Act Density 0.000%

    No Known Activations