INDEX
    Explanations

    phrases indicating a disconnection from reality or understanding

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.01
    2:0.08
    3:0.09
    4:0.14
    5:0.03
    6:0.03
    7:0.36
    8:0.03
    9:0.03
    10:0.06
    11:0.08
    Negative Logits
    ansom
    -1.83
    etheus
    -1.79
     slideshow
    -1.75
    perse
    -1.67
    leck
    -1.65
    rones
    -1.62
     roundup
    -1.60
    ginx
    -1.56
    gins
    -1.52
    gallery
    -1.48
    POSITIVE LOGITS
     reality
    1.92
     realities
    1.88
     whence
    1.52
     stereotype
    1.48
     norm
    1.44
     Establishment
    1.43
     Saiyan
    1.39
     Dealer
    1.38
     sentiments
    1.38
     stereotypes
    1.37
    Act Density 0.001%

    No Known Activations