INDEX
    Explanations

    words associated with identification or categorization, such as markers or gender markers

    references to markers that signify important or distinguishing features in various contexts

    New Auto-Interp
    Negative Logits
    erest
    -0.86
    orld
    -0.84
    ibaba
    -0.78
    obbies
    -0.77
    é¾
    -0.77
    ILLE
    -0.76
    ategory
    -0.75
    rina
    -0.74
    acia
    -0.74
    awar
    -0.74
    POSITIVE LOGITS
     marker
    1.34
     markers
    1.24
    posts
    0.85
     marking
    0.79
    holder
    0.76
     plaque
    0.72
     dotted
    0.69
     pens
    0.69
     flare
    0.69
     indicating
    0.68
    Act Density 0.007%

    No Known Activations