INDEX
    Explanations

    words related to describing physical attributes or characteristics

    concepts and phrases related to social roles and expectations

    New Auto-Interp
    Negative Logits
    ãĥ¼ãĥ³
    -0.54
    ruce
    -0.50
    aughtered
    -0.48
    enger
    -0.46
    liga
    -0.46
    efer
    -0.46
    arij
    -0.45
    querade
    -0.43
     Wem
    -0.42
     Rib
    -0.42
    POSITIVE LOGITS
    entimes
    0.60
    etheless
    0.59
    POS
    0.51
     behavi
    0.51
    terness
    0.50
    consider
    0.50
     sugg
    0.50
    especially
    0.50
     nihil
    0.49
     behaviors
    0.48
    Act Density 1.526%

    No Known Activations