INDEX
    Explanations

    phrases indicating quantities, sizes, or relative importance

    New Auto-Interp
    Negative Logits
    المناصب
    -0.62
    :].
    -0.59
     مشين
    -0.53
    原始内容存档于
    -0.53
    Thick
    -0.51
     Thick
    -0.51
    льше
    -0.49
     mę
    -0.49
     assoluto
    -0.48
    وعة
    -0.47
    POSITIVE LOGITS
     small
    2.98
    small
    2.64
    Small
    2.51
     tiny
    2.50
     SMALL
    2.42
     Small
    2.40
    SMALL
    2.29
     smaller
    2.23
     pequeño
    2.16
     smallest
    2.13
    Act Density 1.341%

    No Known Activations