INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    heter
    -0.09
     json
    -0.08
    approx
    -0.08
    ij
    -0.08
    -Württemberg
    -0.08
    .persist
    -0.08
    eight
    -0.07
    VIII
    -0.07
    -0.07
    -0.07
    POSITIVE LOGITS
     pola
    0.09
     cardio
    0.09
     ditch
    0.08
     ENC
    0.08
     CSA
    0.08
     Interactive
    0.08
     shoreline
    0.08
     Female
    0.08
     dilem
    0.08
    0.07
    Act Density 0.001%

    No Known Activations