INDEX
Explanations
The neuron lights up on qualifying instruction words—most prominently “relevant” (and similar qualifiers like “possible”).
New Auto-Interp
Negative Logits
/alert
-0.07
stalk
-0.07
DIS
-0.07
,看
-0.07
criptions
-0.06
dül
-0.06
Where
-0.06
设
-0.06
discs
-0.06
�
-0.06
POSITIVE LOGITS
také
0.07
lonely
0.07
اجازه
0.07
.toFloat
0.07
无码
0.06
Λ
0.06
mdl
0.06
possível
0.06
Ngân
0.06
değildir
0.06
Activations Density 0.038%