INDEX
Explanations
phrases related to freedom, liberation, and widening compassion
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1984
+0.11
0.3%
845
+0.10
0.3%
1177
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1984
+0.11
0.04
1334
+0.10
0.03
1892
+0.09
0.03
Negative Logits
oubted
-0.67
<bos>
-0.66
EconPapers
-0.59
ometrial
-0.55
asantry
-0.52
firebaseConfig
-0.50
InstrumentedTest
-0.50
DoubleQuotes
-0.50
getSource
-0.49
oneofs
-0.49
POSITIVE LOGITS
atguigu
0.68
Apesar
0.54
logis
0.51
<?
0.51
conforman
0.50
pama
0.50
katun
0.49
Mesmo
0.49
Visión
0.48
Inoltre
0.48
Activations Density 0.261%