INDEX
Explanations
The main thing this neuron does is detect occurrences of heating or high‐temperature terms (e.g., “hot,” “heated,” “heating”).
New Auto-Interp
Negative Logits
958
-0.07
137
-0.07
-light
-0.06
хроничес
-0.06
853
-0.06
UNK
-0.06
effectiveness
-0.06
bben
-0.06
prerequisites
-0.06
options
-0.06
POSITIVE LOGITS
Eat
0.07
Ire
0.07
çözüm
0.07
sinc
0.07
entren
0.06
neckline
0.06
;if
0.06
maxY
0.06
pand
0.06
终于
0.06
Activations Density 0.010%