INDEX
Explanations
common English words
This neuron is effectively “dead” – it never activates and so isn’t detecting any pattern in the text.
New Auto-Interp
Negative Logits
zijn
-0.07
Ubergraph
-0.07
Cater
-0.07
responsibilities
-0.07
Removes
-0.07
putas
-0.07
*
-0.06
ull
-0.06
exchange
-0.06
ruins
-0.06
POSITIVE LOGITS
iệng
0.06
disagrees
0.06
ことで
0.06
ystick
0.06
(EXIT
0.06
.mouse
0.06
edil
0.06
NZ
0.06
Route
0.06
recursive
0.06
Activations Density 0.058%