INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    养老
    -0.90
     and
    -0.89
     other
    -0.88
     several
    -0.85
    menjadi
    -0.82
     dijadikan
    -0.82
    secara
    -0.81
     because
    -0.79
     while
    -0.79
    classic
    -0.77
    POSITIVE LOGITS
     podjet
    0.95
     پرس
    0.93
    correspond
    0.92
     مت
    0.90
     najbolj
    0.89
    ünk
    0.88
     jars
    0.88
     panor
    0.88
    应该是
    0.87
     quits
    0.86
    Act Density 0.012%

    No Known Activations