INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.09
     Bradley
    -0.08
     spur
    -0.07
     microw
    -0.07
     disag
    -0.07
    олот
    -0.07
    -0.07
     Unternehmen
    -0.07
    Govern
    -0.07
    ingale
    -0.07
    POSITIVE LOGITS
     encontr
    0.09
    0.08
     foi
    0.08
    Stable
    0.07
     awake
    0.07
    Dip
    0.07
    .merge
    0.07
     cont
    0.07
    lam
    0.07
     creating
    0.07
    Act Density 0.014%

    No Known Activations