INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    处理
    -0.07
     Silent
    -0.06
     sharply
    -0.06
     permanently
    -0.06
    .ro
    -0.06
     reordered
    -0.06
     Drain
    -0.06
     POWER
    -0.06
     TEN
    -0.06
    ACION
    -0.06
    POSITIVE LOGITS
    -like
    0.13
    like
    0.11
    Lu
    0.08
     like
    0.08
    lik
    0.07
    ike
    0.07
    _like
    0.07
    Democratic
    0.07
     Catholic
    0.06
    lık
    0.06
    Act Density 0.009%

    No Known Activations