INDEX
    Explanations

    references to clothing and accessories

    New Auto-Interp
    Negative Logits
     shirts
    -0.16
    dge
    -0.15
     bufsize
    -0.15
    shirt
    -0.15
     dresses
    -0.15
    yte
    -0.14
     Dresses
    -0.14
    arda
    -0.14
    762
    -0.14
     jeans
    -0.14
    POSITIVE LOGITS
     hat
    0.56
    hat
    0.47
     hats
    0.47
     Hat
    0.47
    Hat
    0.43
    帽
    0.42
     Hats
    0.38
     cap
    0.36
    _hat
    0.36
     caps
    0.35
    Act Density 0.075%

    No Known Activations