INDEX
    Explanations

    references to institutions, events, and prominent figures in culture and history

    New Auto-Interp
    Negative Logits
    ipsis
    -0.16
    ersiz
    -0.15
    ynchron
    -0.15
    gne
    -0.15
    ÑĢави
    -0.14
     sil
    -0.14
    onium
    -0.14
    uffers
    -0.14
    unist
    -0.13
    ansom
    -0.13
    POSITIVE LOGITS
     White
    0.23
    White
    0.20
    .White
    0.17
     Whit
    0.16
     çϽ
    0.16
     Wh
    0.16
     Whites
    0.16
     WHITE
    0.16
    _white
    0.16
    WHITE
    0.16
    Act Density 0.018%

    No Known Activations