INDEX
    Explanations

    mentions of negation or denial

    New Auto-Interp
    Negative Logits
     defaultstate
    -0.85
     raiſ
    -0.81
     autorytatywna
    -0.80
    ſelves
    -0.80
    ſelf
    -0.77
    parsedMessage
    -0.76
    neſs
    -0.75
    IntoConstraints
    -0.74
     itſelf
    -0.73
     reaſon
    -0.72
    POSITIVE LOGITS
     Ni
    1.05
     ni
    1.04
    Ni
    1.00
     ne
    0.85
    就是
    0.65
    就被
    0.59
    нибудь
    0.58
    0.58
    nl
    0.55
     nem
    0.55
    Act Density 0.073%

    No Known Activations