INDEX
    Explanations

    mentions or quotes involving personal actions or statements

    New Auto-Interp
    Negative Logits
    stead
    -0.65
    apo
    -0.63
    gradation
    -0.63
     Detected
    -0.61
    belt
    -0.61
    ablishment
    -0.61
    sites
    -0.59
    force
    -0.59
    compan
    -0.59
     chars
    -0.59
    POSITIVE LOGITS
     themselves
    0.76
     herself
    0.72
    onite
    0.69
     remorse
    0.69
     goodbye
    0.68
     hello
    0.68
     himself
    0.65
    edly
    0.65
     angrily
    0.65
     aloud
    0.63
    Act Density 0.659%

    No Known Activations