INDEX
    Explanations

    references to notable historical figures and their contributions to science

    New Auto-Interp
    Negative Logits
    çļĦä¸Ģ个
    -0.19
     dieser
    -0.16
    ï¼Įå®ĥ
    -0.14
    çļĦä¸Ģ
    -0.14
    è¿Ļç§į
    -0.13
     nÃły
    -0.13
     ìĿ´ëٰ
    -0.12
    æĺ¯ä¸Ģ个
    -0.12
     bunu
    -0.12
     diese
    -0.12
    POSITIVE LOGITS
     the
    1.06
    the
    0.76
    	the
    0.66
    _the
    0.59
    .the
    0.52
    ,the
    0.49
    -the
    0.48
     teh
    0.44
     THE
    0.42
    ethe
    0.37
    Act Density 2.012%

    No Known Activations