INDEX
    Explanations

    phrases that indicate confrontation or conflict

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.03
    2:0.06
    3:0.25
    4:0.01
    5:0.03
    6:0.07
    7:0.14
    8:0.05
    9:0.12
    10:0.06
    11:0.12
    Negative Logits
     Helpful
    -1.29
    stadt
    -1.28
    enthal
    -1.13
    nor
    -1.09
    ERSON
    -1.07
    hift
    -1.06
    Requires
    -1.05
     projecting
    -1.05
    donald
    -1.05
    enhagen
    -1.03
    POSITIVE LOGITS
     ensued
    1.27
     Achilles
    1.16
     Atlantis
    1.14
    emate
    1.10
    weed
    1.09
     foe
    1.09
     advers
    1.08
     Bros
    1.08
     Rabbit
    1.07
     raged
    1.07
    Act Density 0.014%

    No Known Activations