INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    5
    0.88
    4
    0.76
    8
    0.73
    I
    0.72
    .
    0.71
    7
    0.71
    ۰۰
    0.68
    Calories
    0.67
    IService
    0.67
    2
    0.67
    POSITIVE LOGITS
     of
    1.34
    of
    1.03
    w
    0.88
    en
    0.84
     on
    0.83
     של
    0.80
    n
    0.78
    m
    0.77
    ne
    0.76
    re
    0.76
    Act Density 1.432%

    No Known Activations