INDEX
    Explanations

    words related to comparisons or contrasts

    New Auto-Interp
    Negative Logits
    hood
    -0.70
    sat
    -0.69
    SPONSORED
    -0.64
    ³³³³³³³³
    -0.63
    deen
    -0.63
    fair
    -0.62
    sylvania
    -0.62
    mouth
    -0.62
    sov
    -0.60
    she
    -0.59
    POSITIVE LOGITS
     regards
    2.02
     regard
    1.90
     respect
    1.52
    draw
    1.47
    stood
    1.44
    standing
    1.27
    drawn
    1.22
    holding
    1.19
     impunity
    1.12
     hindsight
    1.03
    Act Density 0.189%

    No Known Activations