INDEX
Explanations
phrases related to environmental impact and destruction
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
382
+0.15
0.4%
1741
+0.14
0.4%
1265
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.15
0.07
1265
+0.14
0.04
1757
+0.10
0.04
Negative Logits
tuta
-1.14
umo
-1.12
mef
-1.08
lele
-1.07
sii
-1.06
kasa
-1.04
vnt
-1.03
fte
-1.02
„,
-1.00
istan
-0.99
POSITIVE LOGITS
but
0.69
3
0.66
4
0.65
2
0.64
6
0.63
5
0.62
which
0.62
0
0.62
8
0.61
9
0.60
Activations Density 0.288%