INDEX
    Explanations

    themes of societal expectations and gender roles

    New Auto-Interp
    Negative Logits
     depreci
    -0.15
    brick
    -0.14
    rawer
    -0.14
    éĿĪ
    -0.14
    unner
    -0.14
    nerg
    -0.13
    оÑĢаз
    -0.13
     ðŁĶ
    -0.13
     nackt
    -0.13
    ادا
    -0.13
    POSITIVE LOGITS
     sweetness
    0.35
     innocent
    0.31
     sweet
    0.31
     gentle
    0.30
     sugar
    0.30
     sug
    0.29
     gent
    0.29
     soft
    0.29
     nic
    0.27
     nice
    0.27
    Act Density 0.490%

    No Known Activations