INDEX
    Explanations

    references to a nation or national identity

    New Auto-Interp
    Negative Logits
    sse
    -0.17
    orie
    -0.17
     Nic
    -0.17
    ors
    -0.17
    nice
    -0.17
    ly
    -0.15
    lyn
    -0.15
    ory
    -0.15
     Nice
    -0.15
    lett
    -0.15
    POSITIVE LOGITS
    wide
    0.31
    hood
    0.28
    ally
    0.27
    nal
    0.27
    alse
    0.26
    als
    0.23
    ALSE
    0.23
    -wide
    0.23
    ALLY
    0.22
    ality
    0.22
    Act Density 0.016%

    No Known Activations