INDEX
    Explanations

    mentions of clothing items, particularly hats and headwear

    New Auto-Interp
    Negative Logits
    arda
    -0.17
    yte
    -0.17
     bufsize
    -0.15
    MOVED
    -0.15
     thighs
    -0.14
    dge
    -0.14
    shirt
    -0.14
     Sofa
    -0.13
    icha
    -0.13
    æijĨ
    -0.13
    POSITIVE LOGITS
     hat
    0.54
     hats
    0.49
    hat
    0.47
     Hat
    0.46
    Hat
    0.42
     Hats
    0.40
    帽
    0.39
    _hat
    0.36
     cap
    0.35
     caps
    0.34
    Act Density 0.056%

    No Known Activations