INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     команды
    0.64
     fetus
    0.60
     foothills
    0.60
     redshifts
    0.59
    اکر
    0.59
     echi
    0.59
     lowers
    0.57
     nostrils
    0.57
     pico
    0.56
    ಗೊಳ್ಳ
    0.56
    POSITIVE LOGITS
     of
    0.86
    that
    0.78
    U
    0.75
    A
    0.72
    W
    0.71
    ของ
    0.71
    ون
    0.70
    ari
    0.69
     của
    0.68
    il
    0.67
    Act Density 0.003%

    No Known Activations