INDEX
Explanations
This neuron activates specifically on the token “break.”
New Auto-Interp
Negative Logits
"{}-0.07
Laden
-0.06
@property
-0.06
ोह
-0.06
ite
-0.06
(){↵-0.06
oldest
-0.06
новид
-0.06
adi
-0.06
Netflix
-0.06
POSITIVE LOGITS
JUnit
0.07
Albuquerque
0.07
.Margin
0.07
=g
0.06
.divide
0.06
balances
0.06
Explicit
0.06
ategor
0.06
],[-
0.06
апреля
0.06
Activations Density 0.001%