INDEX
Explanations
mention of specific locations or events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.49
1.8%
1967
+0.21
0.8%
184
+0.14
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1967
+0.49
0.05
1166
+0.21
0.07
16
+0.14
0.07
Negative Logits
What
-0.66
.
-0.64
There
-0.64
It
-0.62
!
-0.62
That
-0.61
How
-0.61
The
-0.60
Not
-0.60
What
-0.59
POSITIVE LOGITS
ftu
1.65
fta
1.64
swarovski
1.52
ricardo
1.50
jorge
1.47
Juf
1.46
dises
1.46
sergio
1.46
fup
1.45
doman
1.43
Activations Density 0.660%