INDEX
Explanations
summaries
This neuron fires on evaluative sentiment words—adjectives or adverbs expressing positive or negative opinion.
New Auto-Interp
Negative Logits
sth
-0.06
descending
-0.06
mony
-0.06
tong
-0.06
zburg
-0.06
urses
-0.06
Knife
-0.06
France
-0.05
atz
-0.05
vil
-0.05
POSITIVE LOGITS
화
0.07
formedURLException
0.07
tienen
0.07
Cre
0.06
_CHANGE
0.06
UITextView
0.06
indign
0.06
_sf
0.06
áci
0.06
_sha
0.06
Activations Density 0.076%