INDEX
Explanations
The neuron activates on words related to melting processes—any form of “melt” or “molten.”
New Auto-Interp
Negative Logits
organ
-0.07
ourage
-0.07
Pří
-0.07
dzi
-0.07
gien
-0.07
Ezek
-0.06
oused
-0.06
gebra
-0.06
dispar
-0.06
sovere
-0.06
POSITIVE LOGITS
melt
0.11
melted
0.10
melting
0.09
melts
0.09
0.08
MET
0.07
meltdown
0.07
Netherlands
0.06
RCT
0.06
warming
0.06
Activations Density 0.004%