INDEX
    Explanations

    expressions of excitement and sharing personal updates

    New Auto-Interp
    Negative Logits
    airo
    -0.17
    ucas
    -0.15
    ainer
    -0.14
    zel
    -0.14
    ateg
    -0.14
     Gel
    -0.13
    ird
    -0.13
    ucker
    -0.13
    aneous
    -0.13
    eper
    -0.13
    POSITIVE LOGITS
     everyone
    0.51
     everybody
    0.48
     Everyone
    0.44
    everyone
    0.42
     y
    0.42
    Everyone
    0.41
    大家
    0.40
     Everybody
    0.39
    Everybody
    0.37
     anyone
    0.31
    Act Density 0.198%

    No Known Activations