INDEX
    Explanations

    instances where a decision is being discussed

    references to decisions made in various contexts

    New Auto-Interp
    Negative Logits
    icas
    -0.72
    vae
    -0.71
    ingers
    -0.68
    icum
    -0.66
     havoc
    -0.66
     tert
    -0.65
    ubric
    -0.64
    aband
    -0.64
     Offense
    -0.63
    uction
    -0.63
    POSITIVE LOGITS
     makers
    1.04
     maker
    0.87
    makers
    0.81
    maker
    0.81
    making
    0.79
    jar
    0.79
     ACTIONS
    0.75
     taken
    0.72
     decision
    0.72
     to
    0.72
    Act Density 0.048%

    No Known Activations