INDEX
    Explanations

    proper nouns, specifically those related to locations or names

    references to a sense of community and collective belonging

    New Auto-Interp
    Negative Logits
     Rasmussen
    -0.67
    OTT
    -0.65
    FU
    -0.64
     Logged
    -0.63
    WAR
    -0.61
    DERR
    -0.60
     Crosby
    -0.59
     âī¡
    -0.58
    wei
    -0.58
    STATE
    -0.57
    POSITIVE LOGITS
    selves
    1.25
    neau
    1.22
    neys
    1.04
    izons
    1.00
    cery
    0.97
    dain
    0.97
    ¯¯¯¯¯¯¯¯
    0.92
    ishment
    0.91
    izont
    0.90
    ishing
    0.84
    Act Density 0.022%

    No Known Activations