INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     to
    -2.39
     first
    -2.17
     only
    -2.16
     about
    -2.13
     just
    -1.96
     in
    -1.88
     one
    -1.86
     more
    -1.82
     larger
    -1.81
     by
    -1.78
    POSITIVE LOGITS
     them
    1.98
     magnific
    1.83
     antaranya
    1.72
     separat
    1.71
     perfecte
    1.63
     Badan
    1.62
    了起来
    1.61
    会的
    1.61
    1.60
    惊艳
    1.59
    Act Density 0.005%

    No Known Activations