INDEX
Explanations
text related to extreme environments, such as Antarctica, and the challenges faced in those conditions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.20
0.7%
1150
+0.19
0.7%
184
+0.15
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.20
0.01
163
+0.19
0.01
1438
+0.15
0.02
Negative Logits
embra
-2.07
fta
-2.07
ftu
-2.04
desir
-2.03
squa
-2.03
volunte
-2.00
emphat
-1.98
fte
-1.95
?...
-1.95
dispen
-1.94
POSITIVE LOGITS
.
0.92
。
0.80
।
0.73
.”
0.73
!
0.71
↵↵
0.70
<eos>
0.70
).
0.70
."
0.69
().
0.69
Activations Density 0.279%