INDEX
    Explanations

    negative descriptors and references to moral wrongdoing

    New Auto-Interp
    Negative Logits
    aison
    -0.17
    èn
    -0.16
    Abstract
    -0.14
     breeze
    -0.14
     Gir
    -0.14
     fragrance
    -0.14
     Prest
    -0.13
    uras
    -0.13
    quia
    -0.13
     Fre
    -0.13
    POSITIVE LOGITS
    mort
    0.18
    mue
    0.15
    arius
    0.15
    ipel
    0.14
    nop
    0.14
    _simps
    0.14
    imei
    0.14
    PILE
    0.14
    adu
    0.14
    lest
    0.14
    Act Density 0.098%

    No Known Activations