INDEX
Explanations
phrases related to weight loss and body image
New Auto-Interp
Negative Logits
Ùĭا
-0.27
're
-0.19
Ø£ÙĬض
-0.18
's
-0.18
ain
-0.17
isn
-0.17
aren
-0.16
Aren
-0.16
hasn
-0.16
doesn
-0.16
POSITIVE LOGITS
Dont
0.54
dont
0.48
dont
0.45
didnt
0.45
nt
0.45
cant
0.45
cant
0.45
doesnt
0.43
youre
0.43
Whats
0.43
Activations Density 1.246%