INDEX
    Explanations

    Code/reports/documents

    This neuron primarily activates on common small “function” words—articles (a, the), auxiliaries/modals (will, can), conjunctions (that), and simple prepositions.

    New Auto-Interp
    Negative Logits
    getC
    -0.06
     Россия
    -0.06
    /people
    -0.06
    -img
    -0.06
    -0.06
     Pant
    -0.06
     GM
    -0.06
     Imperial
    -0.06
     Snowden
    -0.06
    books
    -0.06
    POSITIVE LOGITS
    _lost
    0.07
     ylabel
    0.06
    _typeDefinition
    0.06
     νεφ
    0.06
     disc
    0.06
     Dip
    0.06
    loyd
    0.06
     Spielberg
    0.06
     lik
    0.06
     replic
    0.06
    Act Density 0.278%

    No Known Activations