INDEX
    Explanations

    references to victims of crimes or abuse

    New Auto-Interp
    Negative Logits
    enta
    -0.17
    aber
    -0.17
    apur
    -0.16
    rn
    -0.15
    iname
    -0.15
    azor
    -0.14
    -speaking
    -0.14
    sian
    -0.14
    aker
    -0.14
    ald
    -0.14
    POSITIVE LOGITS
    hood
    0.17
    friendly
    0.16
    ëĭ¹
    0.16
    ivors
    0.16
    úsqueda
    0.15
    änn
    0.14
     Friendly
    0.14
     innocent
    0.14
    Äħż
    0.14
     Zaman
    0.14
    Act Density 0.017%

    No Known Activations