INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Mighty
    -0.09
     continual
    -0.09
     accommod
    -0.09
     though
    -0.09
    abel
    -0.09
     Bod
    -0.08
     subt
    -0.08
     arr
    -0.08
    _ops
    -0.08
    747
    -0.08
    POSITIVE LOGITS
     folks
    0.10
    èªł
    0.10
     Hurt
    0.09
    noch
    0.08
    è¯ļ
    0.08
    梨
    0.08
     Bbw
    0.08
    988
    0.08
    ray
    0.08
    OX
    0.08
    Act Density 0.190%

    No Known Activations