INDEX
Explanations
The neuron activates on relative‐clause trigger words—especially the phrase “those who.”
New Auto-Interp
Negative Logits
releases
-0.07
ン
-0.07
.RadioButton
-0.06
epochs
-0.06
Mapper
-0.06
prints
-0.06
Director
-0.06
射
-0.06
touching
-0.06
econom
-0.06
POSITIVE LOGITS
-cigaret
0.06
_RESULT
0.06
searchData
0.06
_wrong
0.06
jj
0.06
_CPP
0.06
риз
0.06
Convenience
0.06
TEMPL
0.06
_UNSUPPORTED
0.06
Activations Density 0.028%