INDEX
    Explanations

    references to body positivity and the discussion of societal beauty standards

    New Auto-Interp
    Negative Logits
    mux
    -0.15
    bere
    -0.15
    θη
    -0.14
    dle
    -0.13
     sentimental
    -0.13
    óm
    -0.13
    ux
    -0.13
    лоÑĩ
    -0.13
     sodom
    -0.13
     Geoff
    -0.13
    POSITIVE LOGITS
     body
    0.35
     beauty
    0.33
     Body
    0.29
     Beauty
    0.28
     BODY
    0.26
    Beauty
    0.26
    Body
    0.25
    /body
    0.24
    -body
    0.24
     bodies
    0.24
    Act Density 0.115%

    No Known Activations