INDEX
Explanations
negations and expressions of disagreement
New Auto-Interp
Negative Logits
myſelf
-1.05
Efq
-0.89
uſed
-0.88
itſelf
-0.87
raiſ
-0.87
ſeveral
-0.86
Tacitus
-0.81
Manchuria
-0.81
Portály
-0.80
purpoſe
-0.79
POSITIVE LOGITS
is
0.99
not
0.91
Not
0.85
WAS
0.79
我不是
0.78
not
0.76
isn
0.73
不是
0.73
IsNot
0.71
being
0.70
Activations Density 0.114%