INDEX
    Explanations

    phrases that express contrast or alternative perspectives

    Sentences containing "but" or its foreign equivalents

    New Auto-Interp
    Negative Logits
    Without
    -0.64
     Without
    -0.61
     Tanpa
    -0.59
    without
    -0.58
    تقاوى
    -0.58
     gradova
    -0.58
     fără
    -0.56
     WITHOUT
    -0.55
    Переваги
    -0.54
    AutoScale
    -0.53
    POSITIVE LOGITS
     vielmehr
    1.62
    むしろ
    1.31
    而是
    1.25
     anzi
    1.02
     rather
    1.01
     instead
    1.01
    あくまで
    0.98
     sondern
    0.97
     Instead
    0.94
     بلکه
    0.93
    Act Density 0.202%

    No Known Activations