INDEX
    Explanations

    negative adjectives starting with 'un-' followed by descriptive words

    negative prefixes, particularly "un," to identify words associated with negativity or absence

    New Auto-Interp
    Negative Logits
     rows
    -0.78
     flats
    -0.75
     periphery
    -0.72
     racks
    -0.71
     bruises
    -0.70
     indoors
    -0.70
     chained
    -0.70
     separately
    -0.70
     stalls
    -0.70
     deletion
    -0.69
    POSITIVE LOGITS
    help
    1.48
    productive
    1.42
    balanced
    1.38
    important
    1.35
    interesting
    1.32
    professional
    1.30
    readable
    1.27
    original
    1.27
    inspired
    1.26
    ruly
    1.26
    Act Density 0.029%

    No Known Activations