INDEX
    Explanations

    words related to clothing or attire

    references to dress codes and dressing up

    New Auto-Interp
    Negative Logits
    ntil
    -0.77
    venants
    -0.73
    SHIP
    -0.72
    rw
    -0.70
    apt
    -0.68
    ichael
    -0.67
    ocalyptic
    -0.63
    JV
    -0.63
    SPONSORED
    -0.60
    asper
    -0.60
    POSITIVE LOGITS
     gown
    0.95
    glers
    0.92
     rehearsal
    0.90
    uce
    0.86
    maker
    0.84
    bag
    0.83
     shirts
    0.82
     shoes
    0.81
     uniforms
    0.79
     dresses
    0.79
    Act Density 0.031%

    No Known Activations