INDEX
    Explanations

    whole grains and fruits

    New Auto-Interp
    Negative Logits
    is
    1.92
    s
    1.59
    et
    1.50
    a
    1.38
    v
    1.34
    r
    1.30
    u
    1.30
    ing
    1.23
    in
    1.20
    se
    1.17
    POSITIVE LOGITS
    ρα
    1.11
    är
    1.07
    ية
    1.04
    (
    1.04
    ри
    0.95
    לת
    0.87
    μπ
    0.86
    تان
    0.86
    0.84
     отмети
    0.83
    Act Density 0.013%

    No Known Activations