INDEX
    Explanations

    social concepts and interactions

    New Auto-Interp
    Negative Logits
    وم
    0.88
    on
    0.72
    يا
    0.68
    ك
    0.68
    كيد
    0.65
    ين
    0.61
    рили
    0.59
    ح
    0.58
    سا
    0.57
    0.56
    POSITIVE LOGITS
    t
    1.09
     socially
    1.06
     सामाजिक
    1.05
     sociais
    1.03
     sociali
    1.02
     social
    1.01
     اجتماعی
    1.01
     sosial
    1.00
    SOCIAL
    1.00
     사회
    0.98
    Act Density 0.033%

    No Known Activations