INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _instruction
    -0.06
     misunderstanding
    -0.06
    atty
    -0.06
    -0.06
    strap
    -0.06
    Badge
    -0.06
    stav
    -0.06
    utils
    -0.06
    -0.06
     basin
    -0.06
    POSITIVE LOGITS
    =UTF
    0.08
    ~":"
    0.07
     Fourier
    0.07
    ,//
    0.07
    0.07
    =$(
    0.07
     المنت
    0.06
     леч
    0.06
     depending
    0.06
    0.06
    Act Density 0.000%

    No Known Activations