INDEX
Explanations
mentions of negation or denial
New Auto-Interp
Negative Logits
defaultstate
-0.85
raiſ
-0.81
autorytatywna
-0.80
ſelves
-0.80
ſelf
-0.77
parsedMessage
-0.76
neſs
-0.75
IntoConstraints
-0.74
itſelf
-0.73
reaſon
-0.72
POSITIVE LOGITS
Ni
1.05
ni
1.04
Ni
1.00
ne
0.85
就是
0.65
就被
0.59
нибудь
0.58
就
0.58
nl
0.55
nem
0.55
Activations Density 0.073%