INDEX
    Explanations

    phrases related to moral principles or ethical standards

    New Auto-Interp
    Negative Logits
    ittle
    -0.68
    geon
    -0.67
    sites
    -0.67
    sie
    -0.65
    fac
    -0.65
    availability
    -0.65
    azar
    -0.65
    ilant
    -0.65
    arf
    -0.64
    Raid
    -0.63
    POSITIVE LOGITS
     ideals
    0.95
     principles
    0.90
     embodied
    0.87
     beliefs
    0.83
     Values
    0.82
     values
    0.82
     creed
    0.79
     diversity
    0.78
     enshr
    0.78
     tenets
    0.76
    Act Density 0.049%

    No Known Activations