INDEX
    Explanations

    mentions of values, especially in the context of principles, ethics, and belief systems

    references to ethical or moral values

    New Auto-Interp
    Negative Logits
    女
    -0.79
    geon
    -0.73
    igans
    -0.71
    waves
    -0.69
    sie
    -0.68
    bare
    -0.68
    rontal
    -0.68
    jen
    -0.67
    wa
    -0.64
    Sus
    -0.63
    POSITIVE LOGITS
     ideals
    0.77
     values
    0.77
     tolerance
    0.75
    iblings
    0.74
     principles
    0.71
     proposition
    0.69
     guiding
    0.68
     beliefs
    0.68
     sensibilities
    0.68
     propositions
    0.66
    Act Density 0.013%

    No Known Activations