INDEX
    Explanations

    references to accountability and complicity in societal issues

    New Auto-Interp
    Negative Logits
    aller
    -0.19
    åĭĴ
    -0.15
    iegel
    -0.14
     unfamiliar
    -0.14
     humble
    -0.14
    æŁĦ
    -0.14
     nerd
    -0.14
    iso
    -0.14
    mann
    -0.14
    èıľ
    -0.13
    POSITIVE LOGITS
     cond
    0.23
     support
    0.21
     permit
    0.20
     allowing
    0.18
     enabling
    0.18
     comp
    0.18
     enable
    0.18
    åħģ
    0.18
     allow
    0.18
     toler
    0.18
    Act Density 0.287%

    No Known Activations