INDEX
    Explanations

    phrases related to weight loss and body image

    New Auto-Interp
    Negative Logits
    Ùĭا
    -0.27
    're
    -0.19
     Ø£ÙĬض
    -0.18
    's
    -0.18
     ain
    -0.17
     isn
    -0.17
     aren
    -0.16
     Aren
    -0.16
     hasn
    -0.16
     doesn
    -0.16
    POSITIVE LOGITS
     Dont
    0.54
    dont
    0.48
     dont
    0.45
     didnt
    0.45
    nt
    0.45
     cant
    0.45
    cant
    0.45
     doesnt
    0.43
     youre
    0.43
    Whats
    0.43
    Act Density 1.246%

    No Known Activations