INDEX
    Explanations

    references to past actions or positions

    the word "previously" and its variations in context

    New Auto-Interp
    Negative Logits
    eer
    -0.69
    ribution
    -0.69
    letico
    -0.66
    ocracy
    -0.65
    pling
    -0.65
    ging
    -0.64
    Redditor
    -0.64
    roller
    -0.64
    Incre
    -0.64
    lua
    -0.62
    POSITIVE LOGITS
     unsus
    1.00
     unpublished
    0.88
     unsuccessfully
    0.82
     undisclosed
    0.81
     disclosed
    0.81
     incarcerated
    0.79
     experimented
    0.77
     held
    0.76
     teased
    0.76
     belonged
    0.76
    Act Density 0.040%

    No Known Activations