INDEX
Explanations
distribution
The neuron fires on occurrences of the word “distribution” (as found in license‐header comment blocks).
New Auto-Interp
Negative Logits
。他
-0.07
dar
-0.06
Lazar
-0.06
iki
-0.06
acağız
-0.06
lái
-0.06
Aerospace
-0.06
tasar
-0.06
ceny
-0.06
Gordon
-0.06
POSITIVE LOGITS
arpa
0.07
rib
0.07
currentTime
0.07
댓글
0.07
日本
0.06
ιθ
0.06
mpl
0.06
/in
0.06
_preview
0.06
хозяй
0.06
Activations Density 0.001%