INDEX
    Explanations

    relationships involving power dynamics, roles, and social structures

    New Auto-Interp
    Negative Logits
    illez
    -0.16
    ebra
    -0.16
    irth
    -0.16
    едини
    -0.16
    IRTH
    -0.15
    gid
    -0.15
     gid
    -0.14
    raki
    -0.14
    gree
    -0.14
    ihu
    -0.14
    POSITIVE LOGITS
     vs
    0.19
    -vers
    0.18
     versus
    0.17
    -vs
    0.16
    isp
    0.16
     followed
    0.15
     Outdoor
    0.15
    Uvs
    0.15
    auc
    0.15
     Ã
    0.15
    Act Density 0.175%

    No Known Activations