INDEX
    Explanations

    words emphasizing equality, community, and the importance of all individuals

    New Auto-Interp
    Negative Logits
    ãĥĥãĤ·ãĥ¥
    -0.19
    eum
    -0.18
    ifr
    -0.15
    sted
    -0.14
    dat
    -0.14
    uja
    -0.14
    stad
    -0.14
    stab
    -0.14
    ÅĻÃŃd
    -0.14
     ofType
    -0.13
    POSITIVE LOGITS
    ÑĸнÑĮ
    0.15
    rieve
    0.15
    ibold
    0.15
    ayi
    0.14
    REFERRED
    0.14
    adir
    0.14
    /testify
    0.13
    365
    0.13
    reative
    0.13
    neh
    0.13
    Act Density 0.708%

    No Known Activations