INDEX
    Explanations

    dietary restrictions and health

    New Auto-Interp
    Negative Logits
    ש
    1.36
    1.19
    1.17
    м
    1.14
    ب
    1.10
    1.07
    ר
    1.05
    ために
    1.04
    1.04
    υ
    1.02
    POSITIVE LOGITS
    on
    1.70
    a
    1.55
    ir
    1.53
    k
    1.20
    al
    1.15
    ten
    1.13
    ty
    1.09
    and
    1.08
    ing
    1.06
    to
    1.05
    Act Density 0.001%

    No Known Activations