INDEX
Explanations
references to trousers or different types of pants
New Auto-Interp
Negative Logits
})}\
-0.57
}}}}
-0.57
}}}
-0.56
')))
-0.55
)});
-0.54
"))
-0.54
})));
-0.54
)))
-0.53
}});
-0.53
}},\
-0.53
POSITIVE LOGITS
pants
2.25
Pants
2.20
Pants
1.98
pants
1.84
pant
1.44
trousers
1.37
Pant
1.30
Pant
1.23
pant
1.20
trouser
1.20
Activations Density 0.002%