INDEX
    Explanations

    understanding and feeling states

    New Auto-Interp
    Negative Logits
     puedan
    0.31
    调整
    0.30
     column
    0.29
    それぞれ
    0.29
     matching
    0.29
    提取
    0.29
    查詢
    0.29
     extracting
    0.29
    提供
    0.28
     handouts
    0.28
    POSITIVE LOGITS
     merasa
    0.46
     ненави
    0.42
     любить
    0.42
     знаете
    0.42
     чувство
    0.40
    know
    0.40
     know
    0.40
     knew
    0.39
    觉得自己
    0.39
     любит
    0.38
    Act Density 0.164%

    No Known Activations