INDEX
    Explanations

    references to popular snacks and comfort foods

    New Auto-Interp
    Negative Logits
     Soup
    -0.17
    Soup
    -0.16
     MV
    -0.15
     soup
    -0.15
    itel
    -0.15
    Slash
    -0.15
     Blade
    -0.15
    _lambda
    -0.15
    ngine
    -0.15
    soup
    -0.15
    POSITIVE LOGITS
     brittle
    0.21
     bars
    0.17
     reward
    0.17
     snack
    0.16
     Reward
    0.15
    bars
    0.15
     Geh
    0.15
     rewards
    0.14
    nyder
    0.14
     Nab
    0.14
    Act Density 0.111%

    No Known Activations