INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fry
    -0.08
    MLE
    -0.08
    uid
    -0.07
     unbedingt
    -0.07
    chalk
    -0.07
     dringend
    -0.07
     wr
    -0.07
    .mark
    -0.07
    alari
    -0.07
    112
    -0.07
    POSITIVE LOGITS
    体现
    0.10
     teamwork
    0.08
     embody
    0.08
    理念
    0.08
     embodied
    0.07
     Arist
    0.07
    0.07
    zeg
    0.07
     presently
    0.07
     cupid
    0.07
    Act Density 0.009%

    No Known Activations