INDEX
    Explanations

    simple solution

    New Auto-Interp
    Negative Logits
     greet
    -0.07
    Extra
    -0.07
    Blocking
    -0.06
    _builder
    -0.06
     assessment
    -0.06
     рук
    -0.06
    .uniform
    -0.06
     red
    -0.06
     نور
    -0.06
     dobře
    -0.06
    POSITIVE LOGITS
     perf
    0.08
     intermittent
    0.07
     scientifically
    0.07
     sunset
    0.07
    (inp
    0.07
     Tanzania
    0.06
     ext
    0.06
    allel
    0.06
    _]
    0.06
     canc
    0.06
    Act Density 0.011%

    No Known Activations