INDEX
    Explanations

    occurrences of pronouns and inclusive language

    New Auto-Interp
    Negative Logits
    kre
    -0.16
    alfa
    -0.16
     Intelligence
    -0.16
    FLOW
    -0.15
     unc
    -0.15
    ague
    -0.14
    /google
    -0.14
     Equ
    -0.14
     Progress
    -0.14
    g
    -0.14
    POSITIVE LOGITS
     دÙģ
    0.17
    ÅĻeh
    0.17
    oldem
    0.15
    ÛĮØ·
    0.15
     ÐĿÑĥ
    0.15
    miner
    0.15
     watt
    0.14
    <path
    0.14
    ãĥ³ãĥĨ
    0.14
    Dispatch
    0.14
    Act Density 0.001%

    No Known Activations