INDEX
    Explanations

    expressions of desires and goals

    expressions of desire or intent

    New Auto-Interp
    Negative Logits
    asse
    -0.71
    rir
    -0.67
    frac
    -0.66
    icol
    -0.64
    icist
    -0.64
    NVIDIA
    -0.61
    scar
    -0.59
    ashing
    -0.59
    iverpool
    -0.59
    illian
    -0.57
    POSITIVE LOGITS
    reprene
    0.94
     everyone
    0.82
     everybody
    0.81
     to
    0.80
     answers
    0.78
     someone
    0.76
     somebody
    0.76
     revenge
    0.76
     clarification
    0.74
     clarity
    0.72
    Act Density 0.091%

    No Known Activations