INDEX
Explanations
The neuron activates specifically on the word “text.”
New Auto-Interp
Negative Logits
Ге
-0.07
Lý
-0.07
(X
-0.06
οποίο
-0.06
uably
-0.06
Іван
-0.06
_zoom
-0.06
Nguyễn
-0.06
elper
-0.06
ancybox
-0.06
POSITIVE LOGITS
IMITER
0.06
winning
0.06
icontains
0.06
distant
0.06
/Subthreshold
0.06
Hoff
0.06
_TRIANGLES
0.06
입니다
0.06
!)↵↵
0.06
Previous
0.06
Activations Density 0.007%