INDEX
    Explanations

    phrases related to accountability and public scrutiny

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.02
    2:0.15
    3:0.40
    4:0.07
    5:0.03
    6:0.03
    7:0.04
    8:0.03
    9:0.05
    10:0.06
    11:0.06
    Negative Logits
    )."
    -1.79
    igi
    -1.72
    )"
    -1.67
    )</
    -1.66
    iHUD
    -1.49
    acia
    -1.49
    obi
    -1.46
    ),"
    -1.46
    OV
    -1.46
    ofi
    -1.44
    POSITIVE LOGITS
     awoken
    1.74
     worse
    1.70
     somew
    1.69
     subconscious
    1.65
    might
    1.58
    enough
    1.54
     Shit
    1.53
     somehow
    1.52
    better
    1.50
     doomed
    1.50
    Act Density 0.171%

    No Known Activations