INDEX
Explanations
Interests and hobbies
This neuron activates on words naming personal hobbies, interests, or leisure activities.
New Auto-Interp
Negative Logits
Self
-0.08
poured
-0.07
NP
-0.07
Control
-0.07
Capture
-0.07
지원
-0.07
обрет
-0.06
_package
-0.06
работать
-0.06
Skinny
-0.06
POSITIVE LOGITS
0.07
0.06
;width
0.06
():↵
0.06
:^
0.06
■
0.06
correl
0.06
('~0.06
�
0.06
"]),
0.06
Activations Density 0.113%