INDEX
    Explanations

    references to trousers or different types of pants

    New Auto-Interp
    Negative Logits
    })}\
    -0.57
    }}}}
    -0.57
    }}}
    -0.56
    ')))
    -0.55
    )});
    -0.54
    "))
    
    -0.54
    })));
    -0.54
    )))
    
    -0.53
    }});
    -0.53
    }},\
    -0.53
    POSITIVE LOGITS
     pants
    2.25
     Pants
    2.20
    Pants
    1.98
    pants
    1.84
     pant
    1.44
     trousers
    1.37
     Pant
    1.30
    Pant
    1.23
    pant
    1.20
     trouser
    1.20
    Act Density 0.002%

    No Known Activations