INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     당연
    0.39
    各种
    0.38
    /\/
    0.35
    各类
    0.34
     धना
    0.34
    NSIndexPath
    0.34
    今年も
    0.34
    ísmo
    0.33
    看来
    0.33
     ఇక
    0.33
    POSITIVE LOGITS
     отличие
    2.36
     unlike
    2.30
     Unlike
    2.22
    Unlike
    2.13
     differs
    2.09
     отличи
    2.09
    unlike
    2.09
     отлича
    2.02
     distinguishes
    1.97
     differentiates
    1.97
    Act Density 0.073%

    No Known Activations