INDEX
Explanations
The neuron strongly activates on second-person address words (e.g. “you,” “your”) and related reader-directed phrasing.
New Auto-Interp
Negative Logits
Austria
-0.07
Helena
-0.06
verture
-0.06
Indust
-0.06
Advent
-0.06
strup
-0.06
turtles
-0.06
nice
-0.06
менш
-0.06
jugar
-0.06
POSITIVE LOGITS
.serialize
0.07
_BAR
0.06
knull
0.06
glow
0.06
(ex
0.06
тобы
0.06
Median
0.06
/runtime
0.06
.Quad
0.06
(Font
0.06
Activations Density 0.018%