INDEX
Explanations
The neuron never activates—it doesn’t pick out any particular token or pattern.
New Auto-Interp
Negative Logits
Bring
-0.07
Firstly
-0.07
babel
-0.06
еты
-0.06
(Tile
-0.06
.called
-0.06
/mock
-0.06
projet
-0.06
-*-
-0.06
criticism
-0.06
POSITIVE LOGITS
harma
0.06
ụp
0.06
prohibited
0.06
fuse
0.06
izacao
0.06
443
0.06
uffed
0.06
ausal
0.06
deprived
0.06
]); ↵ ↵
0.06
Activations Density 0.006%