INDEX
    Explanations

    phrases or words related to justifying actions or decisions

    terms related to justifying actions or decisions

    New Auto-Interp
    Negative Logits
    OGR
    -0.70
    semble
    -0.69
    ngth
    -0.68
    chn
    -0.67
    INFO
    -0.66
    Sym
    -0.66
    ocry
    -0.65
     clue
    -0.64
     nurs
    -0.63
    ovych
    -0.63
    POSITIVE LOGITS
     inaction
    1.02
     why
    0.97
     spending
    0.92
     cance
    0.86
     banning
    0.86
     abandoning
    0.83
     justifying
    0.82
     sacrificing
    0.81
     postp
    0.80
     imposing
    0.80
    Act Density 0.046%

    No Known Activations