INDEX
    Explanations

    words related to censorship or inappropriate content

    words related to the concept of "uncertainty" or "unknown."

    New Auto-Interp
    Negative Logits
    hyde
    -0.71
     desk
    -0.68
    tery
    -0.64
    ++++++++++++++++
    -0.63
     Monthly
    -0.61
    MENT
    -0.61
     upholding
    -0.60
     menstrual
    -0.58
     Solitaire
    -0.58
    Ò
    -0.58
    POSITIVE LOGITS
    redited
    1.45
    ategor
    1.43
    orrect
    1.38
    ritical
    1.37
    ooked
    1.30
    ount
    1.29
    outh
    1.28
    apped
    1.27
    ivil
    1.25
    ustom
    1.25
    Act Density 0.023%

    No Known Activations