INDEX
Explanations
mentions of dreams or aspirations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.09
0.4%
537
+0.06
0.2%
1978
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
86
+0.09
0.03
1430
+0.06
0.03
882
+0.06
0.03
Negative Logits
<bos>
-1.22
public
-0.80
//
-0.75
,
-0.74
<eos>
-0.73
-0.73
-0.73
@
-0.72
ുറ
-0.71
-0.71
POSITIVE LOGITS
affor
2.41
maneu
2.30
increa
2.28
impra
2.21
scrat
2.15
inev
2.12
guarante
2.12
strick
2.09
desir
2.07
accla
2.06
Activations Density 0.121%