INDEX
Explanations
likelihood
This neuron never activates—it doesn’t respond to any token patterns.
New Auto-Interp
Negative Logits
anguage
-0.07
´s
-0.07
azzo
-0.06
sermon
-0.06
590
-0.06
р
-0.06
Тур
-0.06
Patron
-0.06
receptor
-0.06
ерта
-0.06
POSITIVE LOGITS
likelihood
0.11
likelihood
0.07
elihood
0.07
Lik
0.07
Leaders
0.07
0.07
lãi
0.07
MK
0.06
":[
0.06
Lucas
0.06
Activations Density 0.004%