INDEX
    Explanations

    phrases related to asserting opinions or beliefs

    actions related to coercion or mandatory requirements

    New Auto-Interp
    Negative Logits
    Dim
    -0.70
    ,—
    -0.65
    dim
    -0.63
    see
    -0.61
    RET
    -0.60
    burning
    -0.60
    ersen
    -0.60
    uyomi
    -0.59
    Interested
    -0.58
    .........
    -0.56
    POSITIVE LOGITS
     oneself
    0.81
     entails
    0.76
     yourself
    0.67
     involves
    0.67
     isn
    0.66
    ealous
    0.66
     helps
    0.65
     doesn
    0.64
     someone
    0.64
     truthful
    0.61
    Act Density 0.299%

    No Known Activations