INDEX
Explanations
phrases related to environmental issues and climate change
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.14
0.8%
186
+0.10
0.6%
199
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
186
+0.14
0.03
156
+0.10
0.13
199
+0.10
0.10
Negative Logits
ĥ½
-2.23
¿
-1.94
»
-1.91
·
-1.91
·¸
-1.85
↵
-1.84
↵
-1.84
č↵
-1.84
↵
-1.84
↵↵
-1.84
POSITIVE LOGITS
causes
1.82
naire
1.79
etc
1.76
?,
1.65
(§
1.63
holder
1.61
[...]
1.59
arium
1.59
?",
1.58
occasions
1.55
Activations Density 1.710%