INDEX
    Explanations

    sentences containing personal comments or reflections

    expressions of personal sentiments or reflections on societal issues

    New Auto-Interp
    Negative Logits
    secret
    -0.66
    \-
    -0.64
    amon
    -0.61
     speculation
    -0.60
    sbm
    -0.59
    indal
    -0.58
     unknown
    -0.57
     disastrous
    -0.56
    unknown
    -0.56
     devastating
    -0.55
    POSITIVE LOGITS
     sanity
    1.11
     decency
    1.08
     sane
    1.04
    respect
    1.04
     calmed
    0.98
     honesty
    0.93
     unbiased
    0.93
    cknow
    0.93
     honest
    0.92
     sensible
    0.90
    Act Density 1.540%

    No Known Activations