INDEX
    Explanations

    Social issues

    New Auto-Interp
    Negative Logits
    中央
    -0.07
    square
    -0.07
    -0.06
    گری
    -0.06
     Switch
    -0.06
    spam
    -0.06
     السعود
    -0.06
     Sanders
    -0.06
    SORT
    -0.06
    Mail
    -0.06
    POSITIVE LOGITS
     Lim
    0.07
    атив
    0.07
     queen
    0.06
     socioeconomic
    0.06
    roduce
    0.06
     qualifier
    0.06
     revive
    0.06
     lamp
    0.06
    predicted
    0.06
    chter
    0.06
    Act Density 0.045%

    No Known Activations