INDEX
    Explanations

    various forms of the word "race" and related terms

    New Auto-Interp
    Negative Logits
    shire
    -0.19
    aires
    -0.16
    anguages
    -0.16
    589
    -0.15
    -HT
    -0.15
    ately
    -0.15
    ency
    -0.15
    self
    -0.15
    sh
    -0.15
    ness
    -0.15
    POSITIVE LOGITS
    horse
    0.19
    erp
    0.16
    TokenType
    0.16
    /umd
    0.16
    presso
    0.16
    LAN
    0.15
    course
    0.14
    ovnÃŃ
    0.14
    dirty
    0.14
    ourcem
    0.14
    Act Density 0.028%

    No Known Activations