INDEX
    Explanations

    themes related to equality and inclusivity

    New Auto-Interp
    Negative Logits
    asan
    -0.19
    iken
    -0.16
    tere
    -0.16
    heel
    -0.16
    eza
    -0.16
    opy
    -0.16
    artz
    -0.15
    gid
    -0.15
    urar
    -0.15
    antz
    -0.15
    POSITIVE LOGITS
     everyone
    0.23
    everyone
    0.22
     Everyone
    0.21
     universal
    0.20
    Everyone
    0.20
    bjerg
    0.20
     age
    0.19
     everybody
    0.19
    Universal
    0.18
    universal
    0.18
    Act Density 0.189%

    No Known Activations