INDEX
    Explanations

    This neuron activates on mentions of formal theoretical concepts in scientific text—especially occurrences of the word “theory” and closely related technical terms.

    New Auto-Interp
    Negative Logits
     그렇
    -0.07
     subdued
    -0.07
    -0.07
     Bilg
    -0.07
     Grid
    -0.07
    .As
    -0.07
     Managing
    -0.06
    (fields
    -0.06
     Carolyn
    -0.06
    /default
    -0.06
    POSITIVE LOGITS
     caches
    0.06
     naive
    0.06
    hesion
    0.06
     chicken
    0.06
    acha
    0.06
     const
    0.06
     apologize
    0.06
            
    0.06
    На
    0.06
        ↵↵↵
    0.06
    Act Density 0.035%

    No Known Activations