INDEX
    Explanations

    Followed by expected outputs

    New Auto-Interp
    Negative Logits
    wek
    0.73
     вообще
    0.72
    기도
    0.64
     можем
    0.64
     magari
    0.64
     quoted
    0.63
    0.63
     highlighter
    0.63
    有没有
    0.62
     કેટ
    0.62
    POSITIVE LOGITS
    Following
    0.77
    Chen
    0.76
    Expected
    0.75
    Monitoring
    0.70
    listening
    0.69
    expected
    0.69
    following
    0.68
     Franch
    0.67
     должно
    0.67
     Modulation
    0.67
    Act Density 0.066%

    No Known Activations