INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     contrasting
    -0.08
     habido
    -0.07
     contrasts
    -0.07
     SEO
    -0.07
    üst
    -0.07
     yah
    -0.07
    につ
    -0.07
     degraded
    -0.07
     ontst
    -0.07
     turf
    -0.07
    POSITIVE LOGITS
    erton
    0.08
    ман
    0.08
    -bearing
    0.08
    fit
    0.07
     Niemand
    0.07
    manager
    0.07
    certificate
    0.07
    iar
    0.07
    appropriate
    0.07
     Costume
    0.07
    Act Density 0.005%

    No Known Activations