INDEX
Explanations
This neuron activates on the JSON key “reason,” identifying where the response explains or justifies a decision.
New Auto-Interp
Negative Logits
αρι
-0.07
чес
-0.06
sku
-0.06
apyrus
-0.06
meats
-0.06
Yönetim
-0.06
递
-0.06
privacy
-0.06
پژوه
-0.06
ملة
-0.06
POSITIVE LOGITS
sand
0.07
anticipated
0.06
�
0.06
yms
0.06
kyně
0.06
_render
0.06
newPath
0.06
ента
0.06
Chuck
0.06
keiten
0.06
Activations Density 0.018%