INDEX
    Explanations

    references to race and issues affecting people of color

    New Auto-Interp
    Negative Logits
    abis
    -0.15
    hone
    -0.15
    ât
    -0.14
    ÅĻej
    -0.14
    elik
    -0.14
    ylland
    -0.14
    fty
    -0.14
    iance
    -0.14
    heid
    -0.14
    ntag
    -0.13
    POSITIVE LOGITS
     color
    0.27
     colour
    0.25
     means
    0.23
    Means
    0.23
     goodwill
    0.22
     Means
    0.21
     whom
    0.20
     integrity
    0.20
    means
    0.19
     substance
    0.18
    Act Density 0.030%

    No Known Activations