INDEX
    Explanations

    references to hats and headwear

    New Auto-Interp
    Negative Logits
     clothes
    -0.16
     Hou
    -0.15
    endon
    -0.15
     Clothes
    -0.15
     dresses
    -0.15
     Clothing
    -0.14
    _skin
    -0.14
     Hund
    -0.14
     skin
    -0.14
     hyp
    -0.14
    POSITIVE LOGITS
     hat
    0.76
     hats
    0.69
     Hat
    0.67
    hat
    0.62
    Hat
    0.60
    帽
    0.59
     Hats
    0.55
    _hat
    0.49
     cap
    0.49
     caps
    0.45
    Act Density 0.138%

    No Known Activations