INDEX
    Explanations

    connections between contrasting ideas or concepts

    New Auto-Interp
    Negative Logits
     not
    -2.06
     não
    -1.73
     nicht
    -1.71
    not
    -1.63
     niet
    -1.63
     tidak
    -1.59
     ikke
    -1.56
     neither
    -1.50
     không
    -1.48
     NOT
    -1.47
    POSITIVE LOGITS
    而是
    0.79
     vielmehr
    0.77
    rungsseite
    0.75
     بلکه
    0.68
     autorytatywna
    0.64
     sondern
    0.63
     downright
    0.58
    pyplot
    0.58
     مرئيه
    0.57
    むしろ
    0.56
    Act Density 0.894%

    No Known Activations