INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    in
    0.58
     creencias
    0.57
    inje
    0.55
    inian
    0.55
    inase
    0.53
     சிகி
    0.51
    疾患
    0.50
    inib
    0.49
    avoro
    0.49
     कारण
    0.48
    POSITIVE LOGITS
    ال
    0.51
    fh
    0.45
    ق
    0.42
    Secondary
    0.42
    ފ
    0.42
    δη
    0.40
    ظهر
    0.40
     excessive
    0.39
    тся
    0.39
    Outgoing
    0.39
    Act Density 0.002%

    No Known Activations