INDEX
Explanations
The neuron fires on occurrences of the preposition “into.”
New Auto-Interp
Negative Logits
d
-0.09
z
-0.09
l
-0.08
k
-0.08
j
-0.08
δ
-0.08
el
-0.08
λ
-0.08
b
-0.07
w
-0.07
POSITIVE LOGITS
into
0.20
Into
0.18
Into
0.14
_into
0.12
onto
0.11
into
0.11
INTO
0.11
.into
0.11
In
0.11
TO
0.10
Activations Density 0.079%