INDEX
Explanations
mentions of school-related activities and fundraising efforts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
738
+0.18
0.5%
1403
+0.14
0.4%
381
+0.14
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1403
+0.18
0.02
270
+0.14
0.04
1553
+0.14
0.04
Negative Logits
embra
-1.21
dises
-1.19
exem
-1.17
dispen
-1.15
igno
-1.13
oner
-1.12
emphat
-1.11
volunte
-1.08
effe
-1.07
Sén
-1.07
POSITIVE LOGITS
<bos>
0.89
or
0.68
usually
0.64
或者
0.64
или
0.63
your
0.61
you
0.61
yourself
0.59
typically
0.59
或
0.59
Activations Density 0.510%