INDEX
    Explanations

    mentions of enjoyable food items or rewards

    references to treats or special foods

    New Auto-Interp
    Negative Logits
     constitu
    -0.69
     autonomous
    -0.67
     moot
    -0.65
    ova
    -0.64
     Karin
    -0.62
    seless
    -0.62
    dc
    -0.59
     condem
    -0.59
     Citiz
    -0.59
     Dani
    -0.59
    POSITIVE LOGITS
    ises
    1.12
    ties
    0.95
    ise
    0.95
    nels
    0.94
    itionally
    0.90
    pieces
    0.90
    piece
    0.90
    orial
    0.87
    terson
    0.85
    ery
    0.85
    Act Density 0.024%

    No Known Activations