INDEX
    Explanations

    differences

    The neuron is sharply tuned to detect the word “differences” (and very similar comparative nouns like “disparities”) in the text.

    New Auto-Interp
    Negative Logits
     Newton
    -0.08
     headed
    -0.08
     Studio
    -0.07
    oul
    -0.07
    ANT
    -0.07
     novels
    -0.06
    .cljs
    -0.06
     squat
    -0.06
    .ViewModel
    -0.06
     Weston
    -0.06
    POSITIVE LOGITS
     differences
    0.14
     Differences
    0.13
     difference
    0.12
     Difference
    0.11
     differ
    0.11
    difference
    0.11
    ifferences
    0.09
    _difference
    0.09
     differed
    0.08
    Difference
    0.08
    Act Density 0.029%

    No Known Activations