INDEX
    Explanations

    words related to challenging or being challenged

    concepts and actions related to challenging established ideas or societal norms

    New Auto-Interp
    Negative Logits
    gp
    -0.73
    storage
    -0.68
    ··
    -0.67
    ng
    -0.67
    pool
    -0.64
    gas
    -0.64
    ]}
    -0.63
    bath
    -0.63
    abouts
    -0.62
    anuts
    -0.61
    POSITIVE LOGITS
     precon
    1.36
     stereotypes
    1.31
     assumptions
    1.30
     orthodoxy
    1.14
     misconceptions
    1.13
     myths
    1.09
     conventional
    1.07
     prevailing
    1.04
     notions
    1.02
     beliefs
    1.01
    Act Density 0.249%

    No Known Activations