INDEX
    Explanations

    phrases related to confrontation and personal accountability in discussions

    New Auto-Interp
    Head Attr Weights
    0:0.01
    1:0.03
    2:0.08
    3:0.06
    4:0.02
    5:0.03
    6:0.06
    7:0.09
    8:0.25
    9:0.08
    10:0.11
    11:0.12
    Negative Logits
    iggins
    -1.18
    elo
    -1.16
    -1.09
    prise
    -1.09
    -1.08
    ipel
    -1.04
    adelphia
    -1.01
    eger
    -1.00
    ás
    -1.00
    alg
    -0.99
    POSITIVE LOGITS
     hadn
    1.14
    ');
    1.12
    ')
    1.09
     clicked
    1.02
     discriminated
    1.01
    '),
    1.00
     existed
    1.00
     didnt
    0.99
     lied
    0.96
    chwitz
    0.95
    Act Density 0.003%

    No Known Activations