INDEX
Explanations
This neuron activates on subjective evaluative adjectives and adverbs that express opinions or judgments about quality (e.g. “poor,” “positive,” “prestigious,” “catastrophic”).
New Auto-Interp
Negative Logits
Birmingham
-0.07
Kentucky
-0.07
d
-0.07
posing
-0.06
dull
-0.06
〃
-0.06
.UndefOr
-0.06
REL
-0.06
Officers
-0.06
werde
-0.06
POSITIVE LOGITS
ropa
0.06
-opt
0.06
Бел
0.06
INIT
0.06
svět
0.06
етод
0.06
ственное
0.06
aciones
0.06
spi
0.06
aris
0.06
Activations Density 0.032%