INDEX
Explanations
references to nutrition and nutritional content
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.25
1.5%
224
+0.13
0.8%
328
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
156
+0.25
0.02
224
+0.13
0.02
328
+0.12
0.02
Negative Logits
edes
-1.81
ich
-1.73
pez
-1.68
unts
-1.56
betrayal
-1.56
raf
-1.54
pe
-1.50
abouts
-1.48
wit
-1.45
idegger
-1.41
POSITIVE LOGITS
dioxide
1.83
intake
1.81
ist
1.78
¢
1.71
etic
1.71
arium
1.64
ists
1.62
etics
1.51
esta
1.50
ICES
1.49
Activations Density 0.142%