INDEX
    Explanations

    Citations and references

    This neuron detects occurrences of the explicit “Yes” or “No” answer at the start of a fact‐consistency response.

    New Auto-Interp
    Negative Logits
    “But
    -0.08
    "But
    -0.07
    Strings
    -0.06
    ora
    -0.06
    inet
    -0.06
    ablytyped
    -0.06
    “So
    -0.06
     Seb
    -0.06
    елю
    -0.06
    	This
    -0.06
    POSITIVE LOGITS
     vois
    0.07
    .templates
    0.06
    0.06
    -zone
    0.06
    /md
    0.06
     evidenced
    0.06
    0.06
     solidarity
    0.06
    ]]
    ↵
    0.06
    (jQuery
    0.06
    Act Density 0.011%

    No Known Activations