INDEX
Explanations
references to taking tours
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1937
+0.20
0.8%
555
+0.14
0.5%
1023
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1937
+0.20
0.03
555
+0.14
0.02
1492
+0.12
0.02
Negative Logits
vache
-0.73
Venise
-0.65
menthe
-0.65
boxe
-0.64
canne
-0.59
Janvier
-0.59
Mère
-0.58
siff
-0.58
Diman
-0.57
Église
-0.56
POSITIVE LOGITS
tour
1.37
tour
1.27
tours
1.24
Tour
1.23
Tour
1.17
TOUR
1.12
toured
1.07
Tours
1.04
touring
1.03
tourmaline
0.97
Activations Density 0.061%