INDEX
    Explanations

    mentions of the American population or specific demographics within America

    mentions of "Americans" and "Canadians."

    New Auto-Interp
    Negative Logits
    Initialized
    -0.72
    bol
    -0.64
    cer
    -0.64
     Guru
    -0.63
    Drag
    -0.62
    selection
    -0.62
     Malaysia
    -0.62
     Nanto
    -0.61
    ring
    -0.61
    efully
    -0.61
    POSITIVE LOGITS
    hip
    0.99
    '
    0.82
     tuned
    0.80
    ourcing
    0.79
    ugi
    0.77
     living
    0.77
    ourced
    0.76
     distrust
    0.74
     who
    0.74
    hips
    0.74
    Act Density 0.066%

    No Known Activations