INDEX
    Explanations

    references to dessert items

    New Auto-Interp
    Negative Logits
    ought
    -0.80
    orne
    -0.78
    vernment
    -0.71
    aird
    -0.69
    reen
    -0.69
    sighted
    -0.68
    ne
    -0.67
    ostics
    -0.67
    away
    -0.67
    nesota
    -0.67
    POSITIVE LOGITS
    essert
    1.04
     dessert
    0.97
     desserts
    0.93
     pudding
    0.92
     Dough
    0.89
    ecake
    0.83
     dough
    0.82
     cust
    0.80
     batter
    0.79
     pastry
    0.79
    Act Density 0.026%

    No Known Activations