INDEX
Explanations
proper nouns
This neuron activates on proper names or name-like tokens (i.e. personal or character names).
New Auto-Interp
Negative Logits
primes
-0.06
Titan
-0.06
olves
-0.06
练
-0.06
險
-0.06
-0.06
ベ
-0.06
nelle
-0.06
_teams
-0.06
instruction
-0.06
POSITIVE LOGITS
jj
0.08
rtle
0.07
Katrina
0.07
нен
0.07
Комп
0.07
=this
0.06
=\""
0.06
//"
0.06
vůbec
0.06
ुकस
0.06
Activations Density 0.143%