INDEX
Explanations
Code/reports/documents
This neuron primarily activates on common small “function” words—articles (a, the), auxiliaries/modals (will, can), conjunctions (that), and simple prepositions.
New Auto-Interp
Negative Logits
getC
-0.06
Россия
-0.06
/people
-0.06
-img
-0.06
云
-0.06
Pant
-0.06
GM
-0.06
Imperial
-0.06
Snowden
-0.06
books
-0.06
POSITIVE LOGITS
_lost
0.07
ylabel
0.06
_typeDefinition
0.06
νεφ
0.06
disc
0.06
Dip
0.06
loyd
0.06
Spielberg
0.06
lik
0.06
replic
0.06
Activations Density 0.278%