INDEX
    Explanations

    words related to ethics or moral reasoning

    New Auto-Interp
    Negative Logits
    glers
    -0.92
     Abyss
    -0.74
    ERY
    -0.70
     Cage
    -0.67
     Leap
    -0.66
     Coalition
    -0.66
     Ducks
    -0.64
     Bruins
    -0.64
    ggle
    -0.63
     Gru
    -0.61
    POSITIVE LOGITS
    utations
    1.44
    ulsive
    1.30
    ublic
    1.25
    rehensible
    1.25
    rieve
    1.23
    roach
    1.20
    orters
    1.20
    rint
    1.19
    ressed
    1.17
    uted
    1.17
    Act Density 0.012%

    No Known Activations