INDEX
    Explanations

    terminology related to social justice and historical injustices

    New Auto-Interp
    Negative Logits
    isplay
    -0.16
    eeper
    -0.15
    eyond
    -0.15
     Terr
    -0.15
    ee
    -0.15
    mÄĽ
    -0.14
    lsen
    -0.14
     following
    -0.14
    ccion
    -0.14
     umb
    -0.14
    POSITIVE LOGITS
    .scalablytyped
    0.18
     fitte
    0.15
    ordum
    0.14
    Ïģαβ
    0.14
    ifetime
    0.14
    ëŀĺ
    0.14
    ãĥ©ãĤ¹
    0.14
    obe
    0.14
    rops
    0.14
    enden
    0.14
    Act Density 0.030%

    No Known Activations