INDEX
    Explanations

    negative interactions and conflicts between characters

    instances of body shaming and mockery in social interactions

    New Auto-Interp
    Negative Logits
     contempl
    -0.65
    erning
    -0.65
     modesty
    -0.64
     suitable
    -0.62
    igmatic
    -0.60
    icip
    -0.59
    particularly
    -0.57
     gazing
    -0.57
     gaze
    -0.56
    igent
    -0.55
    POSITIVE LOGITS
     fucked
    0.89
     didnt
    0.88
     THEN
    0.87
     gonna
    0.86
     blah
    0.85
     yeah
    0.85
     fuckin
    0.85
     uh
    0.84
     shit
    0.81
    Yeah
    0.80
    Act Density 1.565%

    No Known Activations