INDEX
    Explanations

    contradiction

    This neuron fires on words signaling logical contradiction or refutation (e.g. “contradicts”).

    New Auto-Interp
    Negative Logits
     Assurance
    -0.07
    	selected
    -0.07
    .case
    -0.06
     Charter
    -0.06
    -0.06
     issued
    -0.06
     Compensation
    -0.06
    venta
    -0.06
     quindi
    -0.06
    <center
    -0.06
    POSITIVE LOGITS
     komen
    0.08
     jars
    0.07
     εισ
    0.06
    0.06
    abe
    0.06
    (font
    0.06
    ITO
    0.06
    MS
    0.06
     Jays
    0.06
    ätz
    0.06
    Act Density 0.008%

    No Known Activations