INDEX
    Explanations

    phrases indicating feelings of luck, privilege, and opportunities

    New Auto-Interp
    Negative Logits
     sympathy
    -0.17
    swick
    -0.16
    arness
    -0.16
    arius
    -0.15
    ilot
    -0.14
    失
    -0.14
    ramid
    -0.14
    orno
    -0.14
     uc
    -0.14
    Reflection
    -0.14
    POSITIVE LOGITS
     Priv
    0.17
     privilege
    0.16
    IES
    0.16
    uppe
    0.15
    /gui
    0.14
    iesen
    0.14
     privileged
    0.14
    TextLabel
    0.14
     priv
    0.14
    .selection
    0.13
    Act Density 0.048%

    No Known Activations