INDEX
    Explanations

    instances of being targeted or labeled as targets in various contexts

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.01
    2:0.12
    3:0.05
    4:0.21
    5:0.03
    6:0.04
    7:0.28
    8:0.04
    9:0.03
    10:0.05
    11:0.06
    Negative Logits
    heit
    -1.74
    ACTED
    -1.50
    VOL
    -1.48
    izont
    -1.41
    ctrl
    -1.40
    -1.38
    ROM
    -1.38
    hop
    -1.36
    verbs
    -1.36
    redo
    -1.34
    POSITIVE LOGITS
     trespass
    1.57
     ridicule
    1.57
     intrusion
    1.41
     tresp
    1.37
     sidel
    1.37
     destro
    1.35
     skelet
    1.35
     Barron
    1.35
     unreasonable
    1.35
     pilgr
    1.34
    Act Density 0.003%

    No Known Activations