INDEX
Explanations
online forums or discussion threads
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1271
+0.14
0.6%
201
+0.14
0.5%
1870
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
201
+0.14
0.01
1271
+0.14
0.02
1637
+0.13
0.01
Negative Logits
gend
-0.55
zyn
-0.52
kram
-0.52
Gez
-0.50
gesta
-0.49
krab
-0.48
wein
-0.47
glan
-0.47
Pá
-0.47
petra
-0.47
POSITIVE LOGITS
forum
1.35
Forum
1.25
forums
1.21
Forum
1.20
forum
1.19
Forums
1.13
FORUM
1.01
FORUM
0.99
Forums
0.89
forums
0.84
Activations Density 0.101%