INDEX
Explanations
This neuron detects phrases that specify “N letters” (e.g. “two letters,” “three letters,” “four letters”) in the context of a probability or replacement description.
New Auto-Interp
Negative Logits
Acceleration
-0.06
np
-0.06
fancy
-0.06
галі
-0.06
enn
-0.06
paed
-0.06
胡
-0.06
Shawn
-0.06
правда
-0.06
↔
-0.05
POSITIVE LOGITS
Module
0.07
.timeScale
0.07
الكتاب
0.07
segment
0.06
Package
0.06
firearm
0.06
ผล
0.06
>' ↵
0.06
initialize
0.06
liable
0.06
Activations Density 0.001%