INDEX
    Explanations

    proper nouns, particularly names of people

    proper nouns, likely related to people's names or entities

    New Auto-Interp
    Negative Logits
    pired
    -0.67
     existed
    -0.63
     exists
    -0.59
     compromises
    -0.57
     subs
    -0.57
    pires
    -0.57
     finds
    -0.57
     hath
    -0.56
    destroy
    -0.56
     doesnt
    -0.56
    POSITIVE LOGITS
    .
    1.03
     sarcast
    0.89
     rhet
    0.77
    .</
    0.76
    _.
    0.75
    .[
    0.74
    .(
    0.72
    .<
    0.71
    ."
    0.71
     via
    0.69
    Act Density 0.177%

    No Known Activations