INDEX
    Explanations

    `ade` followed by `cre` or `class`

    New Auto-Interp
    Negative Logits
    un
    0.93
    as
    0.90
    o
    0.88
    il
    0.79
    ul
    0.79
    n
    0.76
    um
    0.75
    not
    0.72
    ate
    0.68
    ov
    0.67
    POSITIVE LOGITS
    ורי
    0.69
    ков
    0.68
     outstretched
    0.65
     evaded
    0.63
    ні
    0.63
    ким
    0.61
    קד
    0.60
    ковского
    0.59
    شي
    0.59
    кин
    0.59
    Act Density 0.000%

    No Known Activations