INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.08
    -game
    -0.08
    -0.07
    -0.07
     sever
    -0.07
    -0.07
     unders
    -0.07
    aho
    -0.07
     declined
    -0.07
    inng
    -0.07
    POSITIVE LOGITS
     numerator
    0.08
     Ober
    0.08
     erstes
    0.08
     israel
    0.07
     Dazu
    0.07
     Pinot
    0.07
    poly
    0.07
     daarop
    0.07
     Sto
    0.07
     handels
    0.07
    Act Density 0.000%

    No Known Activations