INDEX
    Explanations

    The neuron specifically fires on occurrences of words denoting antiquity—most notably the adjective “old.”

    New Auto-Interp
    Negative Logits
     guilty
    -0.07
     Peek
    -0.07
    وني
    -0.07
    при
    -0.07
    rvé
    -0.07
     increasing
    -0.07
    وک
    -0.07
    -ms
    -0.06
     Damn
    -0.06
    828
    -0.06
    POSITIVE LOGITS
     old
    0.11
     vieille
    0.08
     Old
    0.07
     Αγ
    0.06
    Old
    0.06
     banco
    0.06
    OldData
    0.06
     Radar
    0.06
     motors
    0.06
    0.06
    Act Density 0.012%

    No Known Activations