INDEX
    Explanations

    sentences that convey ethical or moral implications related to various topics

    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.02
    2:0.17
    3:0.27
    4:0.05
    5:0.03
    6:0.04
    7:0.03
    8:0.06
    9:0.08
    10:0.08
    11:0.05
    Negative Logits
    —"
    -2.23
     McCull
    -1.76
     Ancients
    -1.67
     Guant
    -1.62
     Bill
    -1.59
     Pu
    -1.57
     Chron
    -1.55
     Arist
    -1.53
     Cele
    -1.52
     Newsp
    -1.52
    POSITIVE LOGITS
    wx
    1.93
    etheless
    1.91
    "]=>
    1.84
    plugin
    1.82
    sonian
    1.77
    ISON
    1.76
    PDATED
    1.76
     NOTE
    1.74
    displayText
    1.71
     EDIT
    1.71
    Act Density 0.057%

    No Known Activations