INDEX
    Explanations

    expressions related to community engagement and pride

    New Auto-Interp
    Negative Logits
    shit
    -0.15
    hone
    -0.15
    aster
    -0.15
     GENERIC
    -0.15
    fuck
    -0.14
     fuck
    -0.14
    god
    -0.14
    Fuck
    -0.14
    utow
    -0.13
    ASTER
    -0.13
    POSITIVE LOGITS
     Extra
    0.15
     Fork
    0.15
     EXTRA
    0.14
     Fang
    0.14
    arf
    0.14
     pek
    0.14
     proverb
    0.14
    Extra
    0.14
     extra
    0.14
    ãģŁãģł
    0.14
    Act Density 0.506%

    No Known Activations