INDEX
Explanations
This neuron fires on terms describing long-range interactions or correlations in the text.
New Auto-Interp
Negative Logits
Heather
-0.07
Query
-0.06
CustomAttributes
-0.06
mz
-0.06
mlx
-0.06
dw
-0.06
شهید
-0.06
истор
-0.06
달
-0.06
bureauc
-0.06
POSITIVE LOGITS
_LONG
0.07
하지
0.07
reach
0.07
ابة
0.07
concentrates
0.07
transported
0.06
charakter
0.06
действия
0.06
الإن
0.06
Lesson
0.06
Activations Density 0.003%