INDEX
    Explanations

    words related to personal attributes, feelings, and actions

    expressions of luck and fortune

    New Auto-Interp
    Negative Logits
    "},"
    -0.65
     harming
    -0.63
     Achieve
    -0.63
     etc
    -0.63
     Mankind
    -0.62
    ".[
    -0.61
     harmful
    -0.60
     discriminatory
    -0.56
    .","
    -0.56
    çķ
    -0.55
    POSITIVE LOGITS
     guesses
    0.94
     caveat
    0.90
     analogy
    0.87
     caveats
    0.82
     assumption
    0.82
    prisingly
    0.75
     disclaimer
    0.74
     spoilers
    0.74
     understatement
    0.72
     guess
    0.71
    Act Density 0.782%

    No Known Activations