INDEX
    Explanations

    preventing harm and exploitation

    New Auto-Interp
    Negative Logits
     nifty
    1.02
     tasty
    0.97
     funky
    0.92
     wacky
    0.88
     quirky
    0.88
    =!
    0.87
     handy
    0.86
     weird
    0.85
     annoying
    0.83
     giz
    0.83
    POSITIVE LOGITS
     trauma
    1.30
     tragically
    1.25
     retra
    1.18
     heartbreaking
    1.09
     compassion
    1.06
     traumas
    1.05
     Trauma
    1.05
     compassionate
    1.04
     harm
    1.03
     sadly
    1.03
    Act Density 1.814%

    No Known Activations