INDEX
    Explanations

    words related to specific ethnic or cultural identities

    New Auto-Interp
    Negative Logits
    sar
    -0.20
    ted
    -0.19
    ses
    -0.19
    go
    -0.18
    shan
    -0.17
    tt
    -0.17
    ger
    -0.16
    gable
    -0.16
    gie
    -0.16
    scape
    -0.16
    POSITIVE LOGITS
    apolis
    0.31
    ism
    0.28
    alysis
    0.28
    isme
    0.24
    thus
    0.23
    ische
    0.22
    -American
    0.21
    omics
    0.21
    stvo
    0.20
    ismo
    0.20
    Act Density 0.093%

    No Known Activations