INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     injections
    -0.09
     Wak
    -0.08
     bp
    -0.08
    BP
    -0.08
     richt
    -0.07
     ur
    -0.07
     dt
    -0.07
     BP
    -0.07
     tử
    -0.07
    endant
    -0.07
    POSITIVE LOGITS
     trade
    0.09
     між
    0.09
     mellom
    0.09
    trade
    0.09
    0.08
    Trade
    0.08
     Trade
    0.08
     паміж
    0.08
    0.08
     между
    0.08
    Act Density 0.007%

    No Known Activations