INDEX
Explanations
attractive
The neuron responds to positive evaluative words (e.g. adjectives expressing approval or appeal).
New Auto-Interp
Negative Logits
happening
-0.07
on
-0.07
sending
-0.07
helps
-0.07
_col
-0.07
Prob
-0.07
totals
-0.07
Work
-0.07
hurt
-0.06
dos
-0.06
POSITIVE LOGITS
attractive
0.12
tractive
0.09
attractiveness
0.09
brightly
0.08
glamorous
0.08
巨
0.07
_THEME
0.07
entrenched
0.07
леч
0.07
رانه
0.07
Activations Density 0.009%