INDEX
Explanations
The neuron fires on the single‐word answer tokens provided by the assistant (e.g. “Bulldozer,” “Lemon,” “Egg,” “QTabBar,” “stderr,” “Death”).
New Auto-Interp
Negative Logits
.findAll
-0.06
زیست
-0.06
ocoder
-0.06
-0.06
_printf
-0.06
Customer
-0.06
Ваш
-0.06
include
-0.06
knowledge
-0.06
зация
-0.06
POSITIVE LOGITS
име
0.08
قام
0.07
Ödül
0.07
Winter
0.07
xuống
0.07
-Jun
0.07
उद
0.06
Ins
0.06
Civ
0.06
选
0.06
Activations Density 0.071%