INDEX
    Explanations

    cross-lingual representation

    New Auto-Interp
    Negative Logits
    зив
    0.42
     ruthless
    0.42
    alya
    0.41
    aa
    0.41
    ซีน
    0.40
     universities
    0.40
    тук
    0.39
     Wissenschaft
    0.39
    andinavian
    0.39
     වැඩ
    0.39
    POSITIVE LOGITS
     مرد
    0.41
     উদ্বাস্ত
    0.41
     anu
    0.39
    ንድ
    0.39
    ንዱ
    0.39
     पिच
    0.39
    Segundo
    0.38
     ناح
    0.38
    0.38
     teammates
    0.37
    Act Density 0.003%

    No Known Activations