INDEX
Explanations
The neuron strongly activates on occurrences of the word “the.”
New Auto-Interp
Negative Logits
/ts
-0.06
ブ
-0.06
э
-0.06
economics
-0.06
screenshot
-0.06
B
-0.06
sacrifices
-0.06
_phase
-0.06
.Tools
-0.06
bureaucracy
-0.06
POSITIVE LOGITS
immer
0.07
Brick
0.07
_render
0.07
Asheville
0.07
retry
0.06
.Required
0.06
атки
0.06
fullest
0.06
:both
0.06
-for
0.06
Activations Density 0.026%