INDEX
    Explanations

    references to emotional and harsh descriptors

    New Auto-Interp
    Negative Logits
    antan
    -0.15
    ickers
    -0.15
    elf
    -0.15
    iegel
    -0.14
    ÃŃk
    -0.14
    orman
    -0.14
    agas
    -0.14
    ÑĪила
    -0.13
    ella
    -0.13
    иÑĤа
    -0.13
    POSITIVE LOGITS
    ksam
    0.15
    eru
    0.14
    rox
    0.14
    ucz
    0.14
    oux
    0.14
     Pont
    0.14
     BRO
    0.14
    etxt
    0.14
    è¥
    0.13
    ativ
    0.13
    Act Density 0.023%

    No Known Activations