INDEX
Explanations
references to prominent female figures, specifically actresses
New Auto-Interp
Negative Logits
str
-0.19
du
-0.16
utton
-0.16
etri
-0.16
ly
-0.15
jes
-0.15
eri
-0.15
Madden
-0.15
ugas
-0.15
fraction
-0.15
POSITIVE LOGITS
htmlentities
0.16
DM
0.16
secret
0.15
_banner
0.15
topl
0.15
ë´ī
0.15
-banner
0.15
leigh
0.15
emax
0.14
izia
0.14
Activations Density 0.034%