INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     animals
    -0.08
    boll
    -0.08
    152
    -0.08
    Natural
    -0.07
     translating
    -0.07
     understand
    -0.07
     optional
    -0.07
    223
    -0.07
    License
    -0.07
     argu
    -0.07
    POSITIVE LOGITS
    疯狂
    0.10
     эми
    0.09
    0.09
     aggressively
    0.09
    ,好
    0.08
    继续
    0.08
     ادامه
    0.08
    努力
    0.08
     منتشر
    0.08
    -money
    0.08
    Act Density 0.005%

    No Known Activations