INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -2.95
     новых
    -2.94
    觉得
    -2.88
    -2.66
    they
    -2.63
    -2.50
    -2.44
    -2.36
    :“
    -2.34
    while
    -2.33
    POSITIVE LOGITS
     When
    2.53
     Они
    2.38
     einigen
    2.38
     WAY
    2.36
    ла
    2.34
    』(
    2.31
     OTHER
    2.27
     emeritus
    2.23
     originalmente
    2.20
     accordance
    2.19
    Act Density 0.019%

    No Known Activations