INDEX
    Explanations

    phrases indicating contradictions, complexities, and societal frustrations

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.04
    2:0.12
    3:0.04
    4:0.01
    5:0.03
    6:0.13
    7:0.09
    8:0.07
    9:0.06
    10:0.09
    11:0.26
    Negative Logits
     Alone
    -1.20
     consent
    -1.16
    mins
    -1.10
    xton
    -1.07
     anymore
    -1.06
     twitch
    -1.05
    Desktop
    -1.04
     immunity
    -1.00
    debian
    -1.00
     alone
    -0.99
    POSITIVE LOGITS
    than
    1.90
     than
    1.82
     Than
    1.42
    eem
    1.32
    !--
    1.28
    Reviewer
    1.28
    VERTISEMENT
    1.27
    1.27
    !).
    1.25
    1.25
    Act Density 0.036%

    No Known Activations