INDEX
    Explanations

    statements about morality and ethics

    New Auto-Interp
    Negative Logits
     Nightmares
    -0.67
    obbies
    -0.65
    weights
    -0.65
    aneers
    -0.63
     Via
    -0.62
     Messenger
    -0.60
     Coach
    -0.60
     Audit
    -0.60
     Cors
    -0.60
     Dreams
    -0.59
    POSITIVE LOGITS
    omorphic
    1.17
    rael
    1.09
    olated
    1.05
    olation
    1.03
    nt
    0.94
    olate
    0.94
    senal
    0.90
    othermal
    0.89
    gur
    0.88
    omorph
    0.85
    Act Density 0.110%

    No Known Activations