INDEX
    Explanations

    words related to loss, harm, and negative consequences

    New Auto-Interp
    Negative Logits
    minus
    -0.17
    undle
    -0.16
    ruba
    -0.15
    rets
    -0.15
    ë°©
    -0.14
     Dank
    -0.14
     Hairst
    -0.14
    agt
    -0.14
    lessness
    -0.13
    idine
    -0.13
    POSITIVE LOGITS
    edla
    0.15
    baum
    0.14
    LING
    0.14
    inka
    0.14
    amera
    0.14
    posables
    0.14
    ayın
    0.14
    .scalablytyped
    0.14
    employment
    0.13
    Touches
    0.13
    Act Density 0.023%

    No Known Activations