INDEX
    Explanations

    references to individuals or groups of people

    New Auto-Interp
    Negative Logits
    (es
    -0.22
     itself
    -0.17
    wner
    -0.16
    ï¸ı
    -0.15
    人çī©
    -0.15
    berg
    -0.15
    undi
    -0.15
    ìľ¨
    -0.15
    stadt
    -0.15
    ayne
    -0.15
    POSITIVE LOGITS
     who
    0.30
    /entities
    0.24
    who
    0.23
     whom
    0.23
    /groups
    0.20
     Who
    0.20
     اÙĦذÙĬÙĨ
    0.20
    士
    0.19
    hood
    0.19
    age
    0.19
    Act Density 0.113%

    No Known Activations