INDEX
    Explanations

    quotes that express opinions or statements about individuals or groups

    New Auto-Interp
    Head Attr Weights
    0:0.20
    1:0.04
    2:0.06
    3:0.12
    4:0.03
    5:0.16
    6:0.05
    7:0.04
    8:0.04
    9:0.07
    10:0.11
    11:0.04
    Negative Logits
    byss
    -1.30
    -1.21
    qu
    -1.18
    -1.14
     UCHIJ
    -1.13
    quished
    -1.13
    -1.11
     :=
    -1.09
    -1.08
     gal
    -1.05
    POSITIVE LOGITS
    .")
    2.28
    2.13
    ),"
    2.09
    )."
    2.02
    ]."
    1.95
    ']
    1.89
    ").
    1.88
    "]
    1.86
    "),
    1.85
    ")
    1.83
    Act Density 0.007%

    No Known Activations