INDEX
    Explanations

    references to societal issues and norms

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.02
    2:0.11
    3:0.13
    4:0.06
    5:0.03
    6:0.05
    7:0.03
    8:0.27
    9:0.11
    10:0.08
    11:0.04
    Negative Logits
     Schne
    -1.65
     Jagu
    -1.57
     Unch
    -1.57
     Haas
    -1.52
    FTWARE
    -1.51
    ADRA
    -1.49
     Hond
    -1.48
     succession
    -1.44
     Jere
    -1.44
     jarring
    -1.43
    POSITIVE LOGITS
    0010
    2.07
    cms
    2.06
    english
    2.03
    Reviewer
    1.92
    vor
    1.77
     サーティワン
    1.77
    odi
    1.75
    nr
    1.72
    rw
    1.69
    widget
    1.68
    Act Density 0.103%

    No Known Activations