INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    on
    1.10
    n
    0.92
    ת
    0.90
    ing
    0.88
    т
    0.88
    u
    0.86
    н
    0.84
    t
    0.82
    ν
    0.82
    at
    0.80
    POSITIVE LOGITS
     a
    0.78
     in
    0.68
     of
    0.66
     
    0.65
    $,
    0.62
    0.61
     l
    0.61
    0.58
     samego
    0.58
    0.57
    Act Density 0.971%

    No Known Activations