INDEX
    Explanations

    phrases indicating contrast or concession

    New Auto-Interp
    Negative Logits
     Majefty
    -0.93
     Efq
    -0.84
     المعيارى
    -0.84
     Monfieur
    -0.84
     crdi
    -0.80
     myſelf
    -0.80
    ?");
    -0.76
    ?")
    -0.75
    ſelves
    -0.73
     raiſ
    -0.72
    POSITIVE LOGITS
    Nonetheless
    1.27
     nevertheless
    1.25
     Nonetheless
    1.24
     nonetheless
    1.23
    Nevertheless
    1.23
     Nevertheless
    1.17
    Still
    1.04
     still
    0.99
    anmoins
    0.97
     dennoch
    0.97
    Act Density 0.018%

    No Known Activations