INDEX
    Explanations

    terms related to negative impacts or harm caused by policies or actions

    New Auto-Interp
    Negative Logits
    ignon
    -0.19
    RLF
    -0.17
     Animalia
    -0.16
    ennon
    -0.16
    crit
    -0.15
    WidgetItem
    -0.14
    cob
    -0.14
    ebek
    -0.14
    ZR
    -0.14
    cis
    -0.14
    POSITIVE LOGITS
    æİī
    0.20
     McM
    0.16
     mec
    0.16
    Å¡ÃŃ
    0.16
    aken
    0.15
    ogue
    0.15
     efforts
    0.15
    597
    0.14
     Matth
    0.14
    ahir
    0.14
    Act Density 0.138%

    No Known Activations