INDEX
Explanations
The neuron fires on tokens marking the bibliography or references section (e.g. “bib,” “bibliography,” or “references.bib”).
New Auto-Interp
Negative Logits
Fairy
-0.07
verb
-0.07
Exclude
-0.07
Sentry
-0.06
astr
-0.06
HOUSE
-0.06
Thy
-0.06
hangi
-0.06
SOUR
-0.06
허
-0.06
POSITIVE LOGITS
λε
0.07
(parameters
0.07
primera
0.07
是不
0.06
.tipo
0.06
faaliyet
0.06
。不
0.06
amız
0.06
diğer
0.06
wk
0.06
Activations Density 0.001%