INDEX
Explanations
The neuron is detecting the sequence “the” (i.e. the letters t-h-e, whether as the standalone article or embedded inside other words).
New Auto-Interp
Negative Logits
Cust
-0.06
─
-0.06
ssize
-0.06
urging
-0.06
σκο
-0.06
petto
-0.06
$/,↵
-0.06
occurrence
-0.06
Law
-0.06
formed
-0.06
POSITIVE LOGITS
směrem
0.07
limited
0.07
-|
0.07
returnType
0.07
없었다
0.06
alesce
0.06
lıyor
0.06
inadvert
0.06
織
0.06
guten
0.06
Activations Density 0.057%