INDEX
Explanations
mentions of people and their actions or characteristics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1575
+0.12
0.3%
683
+0.09
0.3%
1379
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1415
+0.12
0.03
1575
+0.09
0.04
1450
+0.09
0.02
Negative Logits
Augu
-1.13
wherea
-1.10
volunte
-1.09
secon
-1.04
increa
-1.03
depic
-1.03
michelin
-1.02
lill
-1.02
oner
-1.01
encomp
-1.01
POSITIVE LOGITS
except
0.66
MessageOf
0.64
either
0.62
except
0.59
ViewFeatures
0.58
intios
0.57
usually
0.56
ostavi
0.56
complexContent
0.55
зулта
0.54
Activations Density 0.339%