INDEX
    Explanations

    beyond, below, or above thresholds

    New Auto-Interp
    Negative Logits
    0.35
    较低
    0.34
     nimic
    0.34
     Nearly
    0.33
     Exactly
    0.33
     उत्‍
    0.32
     Whoever
    0.32
     Tripathi
    0.32
    ública
    0.32
     τρόπο
    0.32
    POSITIVE LOGITS
     bounds
    0.70
     threshold
    0.68
     limits
    0.65
     thresholds
    0.64
     предела
    0.62
     limites
    0.60
    threshold
    0.59
     límites
    0.59
     boundaries
    0.59
     reproach
    0.59
    Act Density 0.053%

    No Known Activations