INDEX
    Explanations

    references to food items, specifically burritos

    New Auto-Interp
    Negative Logits
    ober
    -0.18
    succ
    -0.16
    elly
    -0.16
    arem
    -0.16
    otre
    -0.16
    aÅĻ
    -0.16
    esp
    -0.15
    tiv
    -0.15
    elle
    -0.15
    epar
    -0.14
    POSITIVE LOGITS
    rough
    0.35
    rows
    0.33
    rowing
    0.33
    rito
    0.32
    row
    0.29
    ritos
    0.29
    ied
    0.29
    leigh
    0.28
    undi
    0.28
    dock
    0.28
    Act Density 0.009%

    No Known Activations