INDEX
Explanations
The neuron is selectively activating on occurrences of the word “information” (and closely related diffusion-context terms).
New Auto-Interp
Negative Logits
�
-0.07
Comet
-0.06
>{-0.06
Від
-0.06
instructional
-0.06
таб
-0.06
crude
-0.06
усти
-0.06
']])↵
-0.06
inequality
-0.06
POSITIVE LOGITS
proje
0.07
cinco
0.07
pleasantly
0.06
gl
0.06
Bellev
0.06
whim
0.06
gelen
0.06
Serie
0.06
ـ
0.06
linspace
0.06
Activations Density 0.101%