INDEX
    Explanations

    phrases that relate to community or social groups

    New Auto-Interp
    Negative Logits
     strides
    -0.64
     incidental
    -0.64
    */(
    -0.63
    ambers
    -0.62
    naire
    -0.60
    onne
    -0.59
    rums
    -0.59
     assertions
    -0.58
     absorb
    -0.58
     probabilities
    -0.58
    POSITIVE LOGITS
     ours
    0.76
    rica
    0.75
     Humanity
    0.70
     Israel
    0.68
     Mine
    0.68
     hers
    0.68
    tnc
    0.67
     Charity
    0.66
     mine
    0.65
    Ju
    0.65
    Act Density 0.032%

    No Known Activations