INDEX
    Explanations

    expressions related to intellectualism and critique of societal norms

    New Auto-Interp
    Negative Logits
    riot
    -0.15
    Guy
    -0.14
    ewolf
    -0.14
     milf
    -0.14
    811
    -0.14
    alth
    -0.14
    izard
    -0.14
     Cougar
    -0.13
    ordo
    -0.13
    ehir
    -0.13
    POSITIVE LOGITS
     types
    0.26
    -types
    0.22
    types
    0.22
     flakes
    0.21
     rub
    0.21
     ecc
    0.21
     provinc
    0.21
     Types
    0.21
     mis
    0.20
     dol
    0.20
    Act Density 0.371%

    No Known Activations