INDEX
    Explanations

    food-related instructions or descriptions

    New Auto-Interp
    Negative Logits
     pudding
    -0.17
    erah
    -0.16
    Cake
    -0.16
     pancakes
    -0.15
    etine
    -0.15
     strawberry
    -0.15
    cakes
    -0.15
     Cake
    -0.15
    andles
    -0.15
     Dess
    -0.15
    POSITIVE LOGITS
     chips
    0.42
     Chips
    0.37
     chip
    0.37
    chip
    0.34
     Chip
    0.31
     cris
    0.30
    Chip
    0.29
     snack
    0.28
     snacks
    0.27
     crackers
    0.25
    Act Density 0.061%

    No Known Activations