INDEX
    Explanations

    words related to providing helpful suggestions or feedback

    terms related to constructive actions and feedback, often in contrast to abusive or negative behavior

    New Auto-Interp
    Negative Logits
    orph
    -0.90
    atch
    -0.80
    urg
    -0.76
    ared
    -0.75
    aver
    -0.72
    gars
    -0.72
    alm
    -0.69
    olog
    -0.69
    paralle
    -0.69
    andra
    -0.68
    POSITIVE LOGITS
     constructive
    1.15
     criticism
    0.85
     feedback
    0.83
    -+-+
    0.77
     entreprene
    0.76
     redes
    0.75
     repr
    0.74
     sunlight
    0.71
     daylight
    0.71
     criticisms
    0.70
    Act Density 0.011%

    No Known Activations