INDEX
Explanations
The neuron specifically fires on occurrences of words denoting antiquity—most notably the adjective “old.”
New Auto-Interp
Negative Logits
guilty
-0.07
Peek
-0.07
وني
-0.07
при
-0.07
rvé
-0.07
increasing
-0.07
وک
-0.07
-ms
-0.06
Damn
-0.06
828
-0.06
POSITIVE LOGITS
old
0.11
vieille
0.08
Old
0.07
Αγ
0.06
Old
0.06
banco
0.06
OldData
0.06
Radar
0.06
motors
0.06
老
0.06
Activations Density 0.012%