INDEX
    Explanations

    mentions of specific entities or groups within broader topics

    instances of the word "including."

    New Auto-Interp
    Negative Logits
    rait
    -0.88
    iny
    -0.80
    uters
    -0.79
    iri
    -0.79
    iet
    -0.78
    erb
    -0.77
    ules
    -0.74
    uay
    -0.74
    endant
    -0.73
    ifi
    -0.73
    POSITIVE LOGITS
     those
    0.75
     ours
    0.69
     yours
    0.68
     NJ
    0.64
     flashbacks
    0.63
     hasht
    0.62
     spoilers
    0.61
     hypoc
    0.61
     ones
    0.61
    worth
    0.60
    Act Density 0.069%

    No Known Activations