INDEX
Explanations
terms indicating inclusion or association with a group
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.25
1.2%
645
+0.14
0.7%
196
+0.11
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
645
+0.25
0.05
1035
+0.14
0.04
1052
+0.11
0.04
Negative Logits
<bos>
-3.09
put
-0.77
me
-0.74
create
-0.72
get
-0.71
text
-0.71
create
-0.71
/**
-0.71
div
-0.71
operate
-0.70
POSITIVE LOGITS
milf
2.10
increa
2.09
maneu
2.07
affor
2.07
wien
2.04
🤣🤣
2.03
stockholm
2.00
desir
1.99
inev
1.98
perfet
1.97
Activations Density 0.053%