INDEX
    Explanations

    phrases that convey distancing or separation from others or concepts

    New Auto-Interp
    Negative Logits
    swick
    -0.68
    chance
    -0.65
    rano
    -0.64
    iop
    -0.63
    orable
    -0.63
    nosis
    -0.63
    opus
    -0.60
    ores
    -0.59
    frey
    -0.59
    amac
    -0.58
    POSITIVE LOGITS
     oneself
    0.95
     themselves
    0.83
     herself
    0.82
     myself
    0.82
     himself
    0.80
     ourselves
    0.80
    iates
    0.78
     yourselves
    0.75
    iveness
    0.73
     yourself
    0.72
    Act Density 0.012%

    No Known Activations