INDEX
    Explanations

    while acknowledging or contrasting

    New Auto-Interp
    Negative Logits
    rijf
    0.56
     वारदात
    0.51
    恋爱
    0.50
     કર્યા
    0.50
     સંપૂર્ણ
    0.47
     உடனடியாக
    0.47
    מש
    0.46
     Geç
    0.46
     мелдеш
    0.45
    就这样
    0.45
    POSITIVE LOGITS
     in
    0.47
     لله
    0.44
     لح
    0.43
     constitutions
    0.42
     cancers
    0.41
    0.41
     patients
    0.41
     hil
    0.41
     affordability
    0.41
     climatic
    0.40
    Act Density 0.005%

    No Known Activations