INDEX
    Explanations

    mentions of food-related words, particularly those indicating deliciousness

    references to food and cooking, particularly appealing dishes

    New Auto-Interp
    Negative Logits
     cath
    -0.79
    sold
    -0.76
    ãĥĩãĤ£
    -0.72
    åĤ
    -0.71
    izational
    -0.70
    ttle
    -0.69
    GROUND
    -0.67
    walker
    -0.66
    thood
    -0.65
     stances
    -0.65
    POSITIVE LOGITS
     Delicious
    1.01
    ness
    0.82
    avorite
    0.81
    upid
    0.80
    vous
    0.79
    nesses
    0.78
    isine
    0.76
    endish
    0.75
    ery
    0.74
    ï¸
    0.74
    Act Density 0.028%

    No Known Activations