INDEX
Explanations
contradiction
This neuron fires on words signaling logical contradiction or refutation (e.g. “contradicts”).
New Auto-Interp
Negative Logits
Assurance
-0.07
selected
-0.07
.case
-0.06
Charter
-0.06
婆
-0.06
issued
-0.06
Compensation
-0.06
venta
-0.06
quindi
-0.06
<center
-0.06
POSITIVE LOGITS
komen
0.08
jars
0.07
εισ
0.06
올
0.06
abe
0.06
(font
0.06
ITO
0.06
MS
0.06
Jays
0.06
ätz
0.06
Activations Density 0.008%