INDEX
    Explanations

    phrases related to doing things correctly or in the right way

    phrases related to moral or ethical decision-making

    New Auto-Interp
    Negative Logits
    urated
    -0.71
    raltar
    -0.69
     arsen
    -0.68
     lodged
    -0.67
    icia
    -0.65
     vanquished
    -0.62
    angered
    -0.61
    edia
    -0.61
    hner
    -0.61
     settled
    -0.59
    POSITIVE LOGITS
     thing
    1.24
     chores
    1.16
     things
    1.06
     stunts
    1.05
     tasks
    1.03
     job
    1.00
     homework
    0.99
     flips
    0.96
     calculations
    0.96
    thing
    0.94
    Act Density 0.242%

    No Known Activations