INDEX
    Explanations

    phrases that express disagreement or doubts about opinions and their significance

    "not as" or similar negative comparisons

    New Auto-Interp
    Negative Logits
     indeed
    -0.64
     بيها
    -0.59
     nonetheless
    -0.47
     любом
    -0.44
    อยู่
    -0.44
    indeed
    -0.43
     jopa
    -0.43
    ftagPool
    -0.43
    aver
    -0.42
    idemiology
    -0.42
    POSITIVE LOGITS
     tão
    1.04
    那麼
    1.00
     tantas
    0.99
     autant
    0.99
     столь
    0.99
     lika
    0.98
    那么
    0.94
     tantos
    0.92
     tanta
    0.88
     толкова
    0.88
    Act Density 0.353%

    No Known Activations