INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    nab
    -0.07
    prof
    -0.07
    iculously
    -0.07
     oversee
    -0.07
     ver
    -0.07
    ля
    -0.07
     landscaping
    -0.07
     Haf
    -0.07
     welfare
    -0.07
    berry
    -0.06
    POSITIVE LOGITS
    ísmo
    0.09
    .rate
    0.09
    :last
    0.08
    :first
    0.08
    0.08
     بالك
    0.08
     Fraction
    0.08
    _rate
    0.08
     fraction
    0.08
     rates
    0.08
    Act Density 0.013%

    No Known Activations