INDEX
    Explanations

    words relating to health risks and potential harm

    terms related to the negative effects of substances or actions

    New Auto-Interp
    Negative Logits
    rollers
    -0.69
    ciples
    -0.66
    ahs
    -0.65
    zar
    -0.63
    doms
    -0.62
    ilde
    -0.62
    gres
    -0.62
    IDA
    -0.61
    quart
    -0.60
    elle
    -0.59
    POSITIVE LOGITS
     insofar
    0.95
     enough
    0.93
     owing
    0.87
     unless
    0.86
     compared
    0.85
     towards
    0.80
     because
    0.79
     deterrent
    0.76
     toward
    0.75
     against
    0.74
    Act Density 0.245%

    No Known Activations