INDEX
    Explanations

    words related to significant choices or actions

    references to significant decisions

    New Auto-Interp
    Negative Logits
    vae
    -0.77
    english
    -0.72
     havoc
    -0.67
    amen
    -0.67
    icas
    -0.67
     tremend
    -0.67
    outh
    -0.67
    uum
    -0.66
    ighth
    -0.65
    ingers
    -0.65
    POSITIVE LOGITS
     makers
    1.01
    jar
    0.94
     maker
    0.86
     decision
    0.83
    making
    0.81
     ACTIONS
    0.80
    maker
    0.78
     decisions
    0.77
    makers
    0.75
    lessness
    0.71
    Act Density 0.037%

    No Known Activations