INDEX
    Explanations

    phrases relating to criticism or analysis of societal issues

    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.02
    2:0.25
    3:0.14
    4:0.15
    5:0.06
    6:0.03
    7:0.04
    8:0.07
    9:0.05
    10:0.06
    11:0.05
    Negative Logits
     flanked
    -1.67
     paused
    -1.48
     glanced
    -1.45
    ertodd
    -1.39
     urgently
    -1.37
    gently
    -1.36
     respectful
    -1.35
     nodded
    -1.34
     promptly
    -1.34
     coordinating
    -1.33
    POSITIVE LOGITS
     nor
    2.17
     predecessors
    2.03
     counterparts
    1.76
     predecessor
    1.76
    >.
    1.73
     attRot
    1.71
    SPONSORED
    1.68
     anymore
    1.62
    ndra
    1.55
    Nor
    1.55
    Act Density 0.489%

    No Known Activations