INDEX
    Explanations

    mentions of different nationalities or ethnicities in sentences

    references to national entities or identities

    New Auto-Interp
    Negative Logits
    paren
    -0.78
    plet
    -0.77
    VID
    -0.75
    roll
    -0.74
    Nap
    -0.73
    ש
    -0.73
    netflix
    -0.71
    thumbnails
    -0.70
    amar
    -0.70
    isSpecialOrderable
    -0.70
    POSITIVE LOGITS
    etter
    0.85
     Peb
    0.71
     Matters
    0.69
    ities
    0.68
    aurus
    0.68
     nerv
    0.65
    hower
    0.64
    eal
    0.64
    cape
    0.64
     Taj
    0.64
    Act Density 0.047%

    No Known Activations