INDEX
Explanations
This neuron selectively activates on occurrences of the directional preposition “onto.”
New Auto-Interp
Negative Logits
in
-0.12
In
-0.11
в
-0.10
-in
-0.09
In
-0.09
in
-0.09
IN
-0.09
_In
-0.08
_in
-0.08
14
-0.08
POSITIVE LOGITS
onto
0.13
onto
0.07
ulta
0.07
countert
0.07
oto
0.07
@nate
0.07
터
0.07
doğr
0.07
">'.$
0.07
upto
0.07
Activations Density 0.006%