INDEX
    Explanations

    questioning statements or inquiries about various topics

    New Auto-Interp
    Negative Logits
    them
    -0.96
     them
    -0.84
    these
    -0.81
    这两个
    -0.71
    These
    -0.70
     These
    -0.70
    those
    -0.69
    Those
    -0.68
     these
    -0.68
    you
    -0.67
    POSITIVE LOGITS
     anyone
    0.97
     anybody
    0.89
     everyone
    0.82
     everybody
    0.79
    anyone
    0.75
     Anybody
    0.74
     Anyone
    0.67
     ANYONE
    0.66
     Everyone
    0.65
    Anybody
    0.64
    Act Density 0.167%

    No Known Activations