INDEX
Explanations
text related to physics concepts, especially focused on phenomena observed in experiments
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1127
+0.09
0.3%
680
+0.08
0.2%
1533
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.09
0.06
283
+0.08
0.02
1363
+0.08
0.04
Negative Logits
-0.60
Adicion
-0.60
bill
-0.58
spread
-0.57
addition
-0.56
bill
-0.55
רושלים
-0.54
trace
-0.53
Crear
-0.53
grinned
-0.52
POSITIVE LOGITS
sappi
1.14
xi
1.09
soggior
1.04
oliver
1.04
affez
1.03
Xi
0.99
cushi
0.99
XI
0.98
espri
0.98
cammin
0.97
Activations Density 0.347%