INDEX
Explanations
examples
This neuron activates on instructions asking for additional examples (e.g. “add a few more examples to the list”).
New Auto-Interp
Negative Logits
Mathematic
-0.07
compat
-0.07
jiných
-0.07
цю
-0.07
TPP
-0.06
英语
-0.06
please
-0.06
timer
-0.06
budeme
-0.06
丈
-0.06
POSITIVE LOGITS
Stem
0.07
nepří
0.07
ations
0.06
oracle
0.06
ORIZ
0.06
空间
0.06
tarım
0.06
ids
0.06
balık
0.06
agram
0.06
Activations Density 0.000%