INDEX
    Explanations

    notions of negation or contrast in various contexts

    New Auto-Interp
    Negative Logits
    -0.62
     A
    -0.56
    A
    -0.54
    N
    -0.49
    <strong>
    -0.48
     N
    -0.44
     and
    -0.44
    L
    -0.44
    amp
    -0.43
    G
    -0.43
    POSITIVE LOGITS
     itſelf
    1.20
    aarrggbb
    1.19
     myſelf
    1.06
     Theſe
    1.06
     faſt
    1.04
     autorytatywna
    1.03
     Reſ
    1.01
     consultato
    1.00
     Monfieur
    0.98
     raiſ
    0.97
    Act Density 0.294%

    No Known Activations