INDEX
    Explanations

    principles, ethics, and morality-related phrases

    phrases related to ethics and moral principles

    New Auto-Interp
    Negative Logits
    acas
    -0.63
    dor
    -0.62
    zhou
    -0.61
    é¾į
    -0.59
    apons
    -0.58
    ij
    -0.58
    使
    -0.57
    cffff
    -0.55
    cli
    -0.55
    scar
    -0.55
    POSITIVE LOGITS
     coincidence
    0.82
     nutshell
    0.68
     shenan
    0.67
     itch
    0.63
    brainer
    0.63
     pecul
    0.62
     kinda
    0.58
     creek
    0.57
     sill
    0.57
    !"
    0.56
    Act Density 0.988%

    No Known Activations