INDEX
    Explanations

    questions and assertions about societal values and ethics

    New Auto-Interp
    Negative Logits
    ares
    -0.16
    omet
    -0.14
    imulator
    -0.14
    kus
    -0.14
    cdf
    -0.14
    Finder
    -0.14
     hat
    -0.14
     Brew
    -0.13
     Sims
    -0.13
    bah
    -0.13
    POSITIVE LOGITS
     serious
    0.28
     Serious
    0.23
     worth
    0.23
     sane
    0.23
    serious
    0.22
     civilized
    0.21
     Worth
    0.21
     sensible
    0.20
     anyone
    0.20
     decent
    0.20
    Act Density 0.190%

    No Known Activations