INDEX
Explanations
This neuron detects instances of the phrase “Here in” that introduce a study’s methods or findings.
New Auto-Interp
Negative Logits
ु�
-0.06
satire
-0.06
poids
-0.06
tantra
-0.06
ね
-0.06
ippet
-0.06
瀬
-0.06
Put
-0.06
prů
-0.06
넣
-0.06
POSITIVE LOGITS
orthogonal
0.08
least
0.07
γεν
0.07
getElement
0.07
.getRight
0.07
ีช
0.07
protected
0.07
x
0.07
wise
0.07
rigor
0.06
Activations Density 0.001%