INDEX
    Explanations

    food items and descriptive terms related to food

    New Auto-Interp
    Negative Logits
    atem
    -0.75
    orc
    -0.69
    etsk
    -0.67
    plin
    -0.63
    orem
    -0.63
    atis
    -0.62
     forgiven
    -0.60
     shake
    -0.59
    rolet
    -0.58
    amaru
    -0.58
    POSITIVE LOGITS
    th
    1.36
    61
    1.16
    87
    1.16
    81
    1.16
    92
    1.15
    06
    1.14
    91
    1.14
    84
    1.14
    71
    1.13
    00
    1.13
    Act Density 0.863%

    No Known Activations