INDEX
    Explanations

    words related to cleansing or purging

    New Auto-Interp
    Negative Logits
     Unch
    -0.71
    enegger
    -0.71
     Standing
    -0.70
    worthiness
    -0.66
     Werner
    -0.63
    areth
    -0.63
    ONES
    -0.63
     Anxiety
    -0.62
     helmets
    -0.61
     Luther
    -0.61
    POSITIVE LOGITS
    ple
    1.32
    vey
    1.26
    ported
    1.16
    POSE
    1.15
    pose
    1.12
    pure
    1.09
    poses
    1.09
    ples
    1.06
    ging
    1.00
    posed
    0.99
    Act Density 0.078%

    No Known Activations