INDEX
Explanations
Descriptive
This neuron activates on descriptive adjectives and adverbs (i.e. qualifiers of degree or traits).
New Auto-Interp
Negative Logits
halluc
-0.07
blow
-0.07
utex
-0.06
包括
-0.06
tır
-0.06
_sec
-0.06
�认
-0.06
reply
-0.06
Dirk
-0.06
Blizzard
-0.06
POSITIVE LOGITS
البحر
0.06
dirname
0.06
arts
0.06
thân
0.06
traveler
0.06
.mar
0.06
Pow
0.06
straw
0.06
무
0.05
Angiosper
0.05
Activations Density 0.064%