INDEX
    Explanations

    discussions around controversial social topics

    New Auto-Interp
    Negative Logits
    Anyway
    -0.16
     sez
    -0.16
    illon
    -0.15
    bbe
    -0.14
     Anyway
    -0.14
    aar
    -0.14
     kok
    -0.14
     pok
    -0.14
     milieu
    -0.14
    zb
    -0.13
    POSITIVE LOGITS
     apart
    0.25
     engr
    0.23
     void
    0.20
     priv
    0.19
     cater
    0.19
     able
    0.17
     ran
    0.17
     heavily
    0.16
     plaster
    0.16
     preca
    0.16
    Act Density 0.434%

    No Known Activations