INDEX
    Explanations

    negative phrasing and contrastive statements

    New Auto-Interp
    Negative Logits
    AsUp
    -0.69
    makeText
    -0.58
    DispatchToProps
    -0.54
    Бахар
    -0.45
    ниципа
    -0.44
    XMLSchema
    -0.44
    spli
    -0.43
     but
    -0.42
    లి
    -0.42
    ]<<"
    -0.41
    POSITIVE LOGITS
     merely
    0.99
     solely
    0.97
     tantum
    0.91
     alone
    0.91
     only
    0.89
     onely
    0.85
     wyłącznie
    0.85
    only
    0.81
     exclusively
    0.80
    alnız
    0.78
    Act Density 0.285%

    No Known Activations