INDEX
Explanations
This neuron fires on the appearance of the word “prob,” i.e. it detects probability‐question prompts.
New Auto-Interp
Negative Logits
PLE
-0.07
_SER
-0.07
UTC
-0.06
صاح
-0.06
Junction
-0.06
�
-0.06
.getProperty
-0.06
.demo
-0.06
Fully
-0.06
sailors
-0.06
POSITIVE LOGITS
мо
0.06
memiliki
0.06
/st
0.06
-prom
0.06
рії
0.06
.rc
0.06
基金
0.06
absent
0.05
exemptions
0.05
Complex
0.05
Activations Density 0.001%