INDEX
Explanations
The neuron activates on comparative/superlative words and phrases signaling evaluation or “what works best.”
New Auto-Interp
Negative Logits
sıcak
-0.06
.Small
-0.06
Ivory
-0.06
хов
-0.06
بای
-0.06
.Absolute
-0.06
?>">↵
-0.06
三三
-0.06
erialize
-0.06
slaughtered
-0.05
POSITIVE LOGITS
disaster
0.08
esModule
0.07
monument
0.07
monuments
0.07
Homo
0.07
FileSync
0.07
ρευ
0.07
ster
0.07
компании
0.06
чних
0.06
Activations Density 0.013%