INDEX
    Explanations

    phrases related to social or political rejection or disapproval

    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.03
    2:0.19
    3:0.07
    4:0.12
    5:0.08
    6:0.03
    7:0.02
    8:0.14
    9:0.10
    10:0.05
    11:0.03
    Negative Logits
     Angelo
    -1.14
     Cec
    -1.11
    insula
    -1.09
     Cinem
    -1.05
     Lund
    -1.04
     Surviv
    -1.04
     Romero
    -1.03
     Omaha
    -1.03
     Nebula
    -1.02
     Philipp
    -1.02
    POSITIVE LOGITS
    ppings
    1.25
     virtues
    1.25
    cuts
    1.24
    bole
    1.21
     altogether
    1.20
     charms
    1.16
    agin
    1.12
    Downloadha
    1.11
    ocratic
    1.11
    roots
    1.11
    Act Density 0.002%

    No Known Activations