INDEX
    Explanations

    references to different types of food, particularly pizza and sandwich-related terms

    New Auto-Interp
    Negative Logits
     Neal
    -0.16
    lessly
    -0.15
    asi
    -0.15
    ABA
    -0.14
    IPA
    -0.14
     Coffee
    -0.14
    Neal
    -0.14
     coffee
    -0.14
     Fol
    -0.14
     rum
    -0.14
    POSITIVE LOGITS
     joint
    0.26
    joint
    0.24
     joints
    0.24
     Joint
    0.24
    Joint
    0.23
     Parl
    0.20
    _joint
    0.18
     shop
    0.18
     maker
    0.18
     parl
    0.18
    Act Density 0.076%

    No Known Activations