INDEX
    Explanations

    concepts or situations that evoke discomfort or unease

    instances of discomfort or unease

    New Auto-Interp
    Negative Logits
    ership
    -1.07
    ework
    -0.95
    ebook
    -0.80
    ribution
    -0.78
    runner
    -0.77
    successful
    -0.77
    uilding
    -0.77
    ivism
    -0.73
    lass
    -0.73
    enforcement
    -0.73
    POSITIVE LOGITS
     uncomfortable
    0.98
     discomfort
    0.88
     adolesc
    0.74
     une
    0.74
     awkward
    0.73
     truths
    0.72
    NESS
    0.72
     tiss
    0.70
     Osw
    0.68
    nesses
    0.68
    Act Density 0.025%

    No Known Activations