INDEX
    Explanations

    word in contrast to token

    New Auto-Interp
    Negative Logits
     фармацев
    0.41
    0.41
     utilisé
    0.41
     fertilizers
    0.40
    ента
    0.40
     utilisés
    0.39
    大好き
    0.39
    比如说
    0.39
     रिएक्टर
    0.39
     pharmaceuticals
    0.39
    POSITIVE LOGITS
     but
    0.50
     কিন্তু
    0.46
     ancak
    0.46
    but
    0.45
    但不
    0.44
     nhưng
    0.43
     لكن
    0.43
     પરંતુ
    0.43
     אך
    0.42
     ngunit
    0.42
    Act Density 0.006%

    No Known Activations