INDEX
    Explanations

    mentions of individuals and their actions or statements

    New Auto-Interp
    Negative Logits
    ãĤ¼
    -0.65
     Enhance
    -0.65
     Beaut
    -0.64
     Dise
    -0.63
    ishable
    -0.63
     Travels
    -0.62
     Masquerade
    -0.61
     Cutting
    -0.61
     Patch
    -0.61
     MAP
    -0.61
    POSITIVE LOGITS
     replied
    1.53
     answered
    1.40
     reply
    1.38
     replies
    1.38
     responded
    1.34
     answer
    1.29
     hesitated
    1.25
     answers
    1.16
     answ
    1.12
    Answer
    1.11
    Act Density 0.201%

    No Known Activations