INDEX
    Explanations

    single codebase

    New Auto-Interp
    Negative Logits
     RD
    -0.09
    -0.07
     malt
    -0.07
    RD
    -0.07
     various
    -0.07
    RV
    -0.07
     
    -0.07
     verschiedene
    -0.07
     empfe
    -0.07
    		 
    -0.07
    POSITIVE LOGITS
    beitet
    0.09
    aisa
    0.09
    כר
    0.09
     camiseta
    0.08
    orsunuz
    0.08
    adeed
    0.08
    manageable
    0.08
     formazione
    0.08
     giveaways
    0.08
    cional
    0.08
    Act Density 0.001%

    No Known Activations