INDEX
    Explanations

    safety, respect, and care

    New Auto-Interp
    Negative Logits
     repur
    0.87
     Retry
    0.83
     repurposed
    0.80
     prefabricated
    0.79
     uptick
    0.78
     incentivize
    0.78
     PTSD
    0.76
     KPIs
    0.76
     sparen
    0.75
     disgruntled
    0.75
    POSITIVE LOGITS
    respect
    0.80
    Equity
    0.79
     affirming
    0.79
     Affirm
    0.75
     Equity
    0.75
    intel
    0.75
     EQUITY
    0.75
     respect
    0.74
    affirming
    0.74
     лично
    0.73
    Act Density 0.237%

    No Known Activations