INDEX
    Explanations

    personal pronouns and verbs indicating speaking, writing, or explaining from a first-person perspective

    New Auto-Interp
    Negative Logits
    noon
    -0.73
    acters
    -0.69
    estial
    -0.63
    ictional
    -0.61
    rocket
    -0.60
    selection
    -0.60
    iencies
    -0.60
     torch
    -0.59
    paralle
    -0.59
     cloning
    -0.58
    POSITIVE LOGITS
     said
    1.47
    said
    1.25
     wrote
    1.23
     says
    1.23
     exclaimed
    1.21
     explained
    1.14
    Said
    1.13
     replied
    1.13
     joked
    1.10
     told
    1.09
    Act Density 0.836%

    No Known Activations