INDEX
    Explanations

    references to social structure and dynamics within communities

    New Auto-Interp
    Negative Logits
     is
    -0.30
     isn
    -0.28
     ÎŃÏĩει
    -0.28
     دارد
    -0.27
     ÙĨدارد
    -0.25
     ÑıвлÑıеÑĤÑģÑı
    -0.24
    —is
    -0.24
     Ø®ÙĪØ§Ùĩد
    -0.23
     has
    -0.23
    has
    -0.22
    POSITIVE LOGITS
     were
    0.91
    were
    0.73
     weren
    0.72
     Were
    0.71
    Were
    0.66
     waren
    0.53
     бÑĭли
    0.53
     بÙĪØ¯ÙĨد
    0.51
     fueron
    0.49
     wurden
    0.47
    Act Density 0.423%

    No Known Activations