INDEX
    Explanations

    This neuron detects instances of the phrase “Here in” that introduce a study’s methods or findings.

    New Auto-Interp
    Negative Logits
    ु�
    -0.06
     satire
    -0.06
     poids
    -0.06
     tantra
    -0.06
    -0.06
    ippet
    -0.06
    -0.06
     Put
    -0.06
     prů
    -0.06
    -0.06
    POSITIVE LOGITS
     orthogonal
    0.08
    least
    0.07
    γεν
    0.07
    getElement
    0.07
    .getRight
    0.07
    ีช
    0.07
    protected
    0.07
    x
    0.07
    wise
    0.07
     rigor
    0.06
    Act Density 0.001%

    No Known Activations