INDEX
Explanations
This neuron activates on metalinguistic “when we say that…” or similar phrasing that signals a definition or explanation is about to follow.
New Auto-Interp
Negative Logits
okrat
-0.07
"Do
-0.06
_crit
-0.06
genotype
-0.06
Recycling
-0.06
cio
-0.06
히
-0.06
foods
-0.06
SE
-0.06
Personality
-0.06
POSITIVE LOGITS
Erica
0.06
Aux
0.06
Xã
0.06
External
0.06
toast
0.06
produkt
0.06
тю
0.06
Olympia
0.06
breadcrumb
0.06
igner
0.06
Activations Density 0.029%