INDEX
    Explanations

    questions or statements followed by actions or intentions to be carried out

    phrases that involve asking questions or addressing issues

    New Auto-Interp
    Negative Logits
    aughed
    -0.64
     condem
    -0.62
    been
    -0.59
    ãĤ¦ãĤ¹
    -0.57
    tips
    -0.55
    / 
    -0.55
    è¦
    -0.53
    \">
    -0.53
    Prev
    -0.53
    +.
    -0.52
    POSITIVE LOGITS
     requires
    1.10
     we
    0.87
    requires
    0.83
     please
    0.81
     oneself
    0.77
     involves
    0.77
     you
    0.76
     depends
    0.75
     lies
    0.69
     properly
    0.69
    Act Density 0.244%

    No Known Activations