INDEX
Explanations
phrases related to eating and diet
New Auto-Interp
Negative Logits
coffee
-0.15
Coffee
-0.15
wines
-0.15
mlin
-0.14
atcher
-0.14
edics
-0.14
999
-0.14
Wine
-0.14
uz
-0.14
aks
-0.14
POSITIVE LOGITS
/dr
0.25
disorders
0.23
Disorders
0.20
Disorder
0.19
disorder
0.18
/shop
0.17
ery
0.17
vor
0.17
habits
0.16
humble
0.15
Activations Density 0.039%