INDEX
    Explanations

    phrases indicating a desire or intention to do something

    expressions of desire or intention

    New Auto-Interp
    Negative Logits
    stand
    -0.67
    bug
    -0.65
    ibliography
    -0.64
    trust
    -0.64
    aka
    -0.61
     ®
    -0.60
    alias
    -0.60
    voc
    -0.59
    manship
    -0.59
    iop
    -0.57
    POSITIVE LOGITS
     clarification
    0.79
     revenge
    0.77
     to
    0.72
     something
    0.71
     answers
    0.66
     permission
    0.66
     clarity
    0.64
     desperately
    0.61
     attention
    0.61
    ":[{"
    0.60
    Act Density 0.093%

    No Known Activations