INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     khảo
    -1.02
     intend
    -0.93
    Trigon
    -0.88
    неве
    -0.84
     Wikimedia
    -0.82
    zzi
    -0.80
     unclear
    -0.80
    你觉得
    -0.79
     only
    -0.79
    ")->
    -0.79
    POSITIVE LOGITS
     knows
    1.48
     know
    1.47
     probably
    1.43
     doubtless
    1.41
     уже
    1.29
     already
    1.28
     surely
    1.28
     zapew
    1.24
    probably
    1.21
     will
    1.19
    Act Density 0.055%

    No Known Activations