INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     satisfactory
    -0.06
     discovers
    -0.06
     cable
    -0.06
     marginalized
    -0.06
     axis
    -0.06
    illard
    -0.06
     bron
    -0.06
     reflex
    -0.06
     score
    -0.06
    coles
    -0.06
    POSITIVE LOGITS
    uento
    0.07
    lean
    0.06
    ynthia
    0.06
    -worker
    0.06
    (proto
    0.06
    fullName
    0.06
     Україні
    0.06
    Countries
    0.06
    !↵↵↵
    0.06
    _^(
    0.06
    Act Density 0.001%

    No Known Activations