INDEX
Explanations
The neuron detects descriptive adjectives referring to a person’s physical appearance.
New Auto-Interp
Negative Logits
력이
-0.07
полити
-0.07
Until
-0.07
trademarks
-0.06
$/)
-0.06
دنیا
-0.06
)에
-0.06
.:.
-0.06
Isn
-0.06
ازد
-0.06
POSITIVE LOGITS
UNIT
0.08
Sit
0.07
Entr
0.06
(hdr
0.06
stronghold
0.06
ior
0.06
ελ
0.06
Low
0.06
getter
0.06
φαρ
0.06
Activations Density 0.493%