INDEX
Explanations
The neuron fires on occurrences of the preposition “in,” especially when it introduces dates or category phrases.
New Auto-Interp
Negative Logits
men
-0.07
Rubio
-0.07
Rif
-0.07
aval
-0.07
lev
-0.07
embargo
-0.06
conv
-0.06
Sun
-0.06
mož
-0.06
k
-0.06
POSITIVE LOGITS
。これ
0.06
LogManager
0.06
alertView
0.06
conexao
0.06
_STATS
0.06
τρα
0.06
koneč
0.06
되었습니다
0.06
알아
0.06
unexpected
0.05
Activations Density 0.001%