INDEX
    Explanations

    statements about the importance of moral and physical character

    New Auto-Interp
    Negative Logits
     famously
    -0.17
     duk
    -0.15
    uci
    -0.15
    uur
    -0.15
    favor
    -0.15
    IColor
    -0.14
    abay
    -0.14
     à¤ľà¤¯
    -0.14
     basically
    -0.14
    icari
    -0.14
    POSITIVE LOGITS
     addict
    0.18
     intr
    0.18
     fancy
    0.18
     essay
    0.17
     shrink
    0.17
     consent
    0.16
     sacr
    0.15
     hourly
    0.15
     docs
    0.15
     sha
    0.15
    Act Density 0.289%

    No Known Activations