INDEX
    Explanations

    references to injury or harm

    New Auto-Interp
    Negative Logits
     Affero
    -0.16
    Nhap
    -0.15
    ISBN
    -0.15
    ropolis
    -0.15
    ressive
    -0.15
    urple
    -0.14
    ån
    -0.14
    491
    -0.14
    osa
    -0.14
    opus
    -0.14
    POSITIVE LOGITS
    defs
    0.15
    ocale
    0.14
     Em
    0.14
     Zug
    0.14
    ULE
    0.14
    ÑĻ
    0.13
     electrom
    0.13
    aldi
    0.13
     hans
    0.13
    stk
    0.13
    Act Density 0.000%

    No Known Activations