INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    管çIJĨåĴĮ
    -0.29
     somewhere
    -0.28
    icle
    -0.27
    çļĦ巨大
    -0.26
     trick
    -0.25
     remotely
    -0.25
    cies
    -0.24
    çļĦç¨ĭ度
    -0.24
    çľĭä¸Ĭåİ»
    -0.24
     degree
    -0.24
    POSITIVE LOGITS
    apore
    0.27
    FORE
    0.27
    强åĮĸ
    0.26
    Courses
    0.26
    ãĥªãĥ³
    0.26
    åģļ强
    0.26
    á»Ļt
    0.25
     Lect
    0.24
    oken
    0.24
    abbr
    0.24
    Act Density 0.048%

    No Known Activations