INDEX
    Explanations

    phrases related to communication and information sharing

    phrases related to communication or requests for input

    New Auto-Interp
    Negative Logits
    Distance
    -0.55
    animate
    -0.54
     draining
    -0.53
     predators
    -0.51
    enery
    -0.51
     raping
    -0.50
     Vegeta
    -0.49
     murdering
    -0.49
     hurting
    -0.49
     Females
    -0.49
    POSITIVE LOGITS
     archive
    0.78
    published
    0.74
     publish
    0.73
     redacted
    0.73
     reader
    0.73
     informative
    0.73
     editor
    0.71
     publication
    0.70
     edited
    0.67
     excerpts
    0.67
    Act Density 1.988%

    No Known Activations