INDEX
    Explanations

    mentions of safety and security concerns related to specific locations or communities

    New Auto-Interp
    Negative Logits
    ovit
    -0.16
    PTY
    -0.16
    浦
    -0.15
    à¹īว
    -0.15
    ipop
    -0.15
     filmer
    -0.14
    ÑĤов
    -0.14
    998
    -0.14
    ìĹ´
    -0.14
    _plate
    -0.14
    POSITIVE LOGITS
     Manit
    0.21
    719
    0.21
     COS
    0.18
     Palmer
    0.17
     Wide
    0.17
    arak
    0.17
     Sang
    0.16
     Peyton
    0.16
     Mueller
    0.16
     CSP
    0.16
    Act Density 0.012%

    No Known Activations