INDEX
    Explanations

    occurrences of gender-specific nouns and verbs related to criminal behavior

    New Auto-Interp
    Negative Logits
    anni
    -0.16
    arel
    -0.15
    ç¿
    -0.15
    388
    -0.14
    058
    -0.14
    achs
    -0.14
    utsche
    -0.14
    BÃłi
    -0.14
    èij
    -0.13
    दर
    -0.13
    POSITIVE LOGITS
    кав
    0.14
    apore
    0.14
    oron
    0.14
    çķª
    0.14
    erves
    0.14
    .ix
    0.14
    [assembly
    0.14
    ãģ£ãģ¡
    0.14
    ahren
    0.14
    rana
    0.13
    Act Density 0.195%

    No Known Activations