INDEX
Explanations
negations or phrases indicating the absence of something
New Auto-Interp
Negative Logits
Steck
-0.64
Laird
-0.62
+
-0.59
cime
-0.59
Chau
-0.57
miu
-0.57
Jinping
-0.56
-0.56
Luiz
-0.56
ervo
-0.56
POSITIVE LOGITS
NOT
1.37
NOT
1.23
Not
1.22
Not
1.20
not
1.15
isNot
0.92
ENOT
0.83
assertNot
0.81
IsNot
0.79
not
0.78
Activations Density 0.122%