INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .MM
    -0.09
    _bc
    -0.07
    (Room
    -0.07
     minority
    -0.07
    ضر
    -0.07
     residence
    -0.07
     kinda
    -0.07
    isson
    -0.07
    -0.07
     population
    -0.07
    POSITIVE LOGITS
     sau
    0.07
    0.07
    我们在
    0.07
    )>↵
    0.06
    claim
    0.06
     Saturday
    0.06
    하기
    0.06
    ua
    0.06
    Posting
    0.06
     sociology
    0.06
    Act Density 0.000%

    No Known Activations