INDEX
    Explanations

    references to various national or ethnic identities

    New Auto-Interp
    Negative Logits
    lessly
    -0.19
    PÅĻÃŃ
    -0.17
    lying
    -0.15
    ´s
    -0.15
    ymax
    -0.15
    ptron
    -0.15
    evi
    -0.15
    panies
    -0.15
    ful
    -0.14
     itself
    -0.14
    POSITIVE LOGITS
     who
    0.31
    who
    0.26
    '
    0.26
     whom
    0.22
    0.21
    cape
    0.19
    -Americans
    0.18
    -American
    0.18
    que
    0.18
    ided
    0.18
    Act Density 0.088%

    No Known Activations