INDEX
    Explanations

    phrases related to negative attributes or impacts

    mentions of negative concepts or experiences

    New Auto-Interp
    Negative Logits
    DOM
    -0.87
    conservancy
    -0.81
    ITNESS
    -0.81
    heet
    -0.78
    dropping
    -0.77
    hower
    -0.77
    plain
    -0.77
    pread
    -0.77
    abiding
    -0.76
    raltar
    -0.76
    POSITIVE LOGITS
     reinforcement
    1.05
     spiral
    0.90
     Negative
    0.89
     gearing
    0.86
     impact
    0.85
     effects
    0.85
     consequence
    0.84
     feedback
    0.82
     consequences
    0.82
     publicity
    0.82
    Act Density 0.023%

    No Known Activations