INDEX
Explanations
The neuron activates most strongly on proper names or titles (especially movie titles) in the text.
New Auto-Interp
Negative Logits
;;
-0.08
shit
-0.07
installer
-0.06
sued
-0.06
名
-0.06
mối
-0.06
Nose
-0.06
tắc
-0.06
suing
-0.06
bean
-0.06
POSITIVE LOGITS
Schl
0.07
Consulta
0.07
zeich
0.06
hashlib
0.06
immutable
0.06
destined
0.06
.articles
0.06
Giuliani
0.06
ομά
0.06
ски
0.06
Activations Density 0.061%