INDEX
    Explanations

    words related to non-standard or unconventional practices, often in the context of different categories such as immigrants, military, food, religion, and state

    terms associated with categories and classifications, particularly around non-conformity and specific identity groups

    New Auto-Interp
    Negative Logits
    CHAT
    -0.78
    adder
    -0.72
    AMS
    -0.67
    Dialogue
    -0.66
    oglu
    -0.66
    MAC
    -0.64
    wagen
    -0.64
    brance
    -0.64
    isu
    -0.61
    frey
    -0.61
    POSITIVE LOGITS
    theless
    0.96
    withstanding
    0.85
    ensical
    0.82
    existent
    0.78
    istant
    0.77
     whatsoever
    0.77
    ensable
    0.76
    igenous
    0.76
     nor
    0.75
    roleum
    0.73
    Act Density 0.072%

    No Known Activations