INDEX
    Explanations

    references to emotions or reactions within social contexts

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.02
    2:0.08
    3:0.06
    4:0.21
    5:0.03
    6:0.18
    7:0.12
    8:0.03
    9:0.07
    10:0.06
    11:0.07
    Negative Logits
     flashlight
    -1.24
     ALWAYS
    -1.19
     masturb
    -1.19
    Shift
    -1.14
     foreskin
    -1.13
    Cola
    -1.12
    istries
    -1.12
     guided
    -1.11
     recip
    -1.11
     successfully
    -1.10
    POSITIVE LOGITS
     Horowitz
    1.39
     Colomb
    1.36
     Santos
    1.36
     Byrne
    1.26
     Brock
    1.25
     Vale
    1.23
    igans
    1.21
     Regina
    1.21
    ablishment
    1.19
     Kop
    1.19
    Act Density 0.001%

    No Known Activations