INDEX
Explanations
phrases related to personal identity and self-acceptance
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1533
+0.11
0.3%
919
+0.10
0.3%
674
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1533
+0.11
0.01
919
+0.10
0.02
185
+0.09
0.03
Negative Logits
centrif
-1.09
Khart
-1.05
Fg
-1.03
nephe
-1.02
daf
-1.00
Augu
-1.00
dci
-0.98
pixabay
-0.97
auri
-0.97
olx
-0.96
POSITIVE LOGITS
<bos>
0.94
courage
0.62
disclose
0.59
disclosure
0.57
gradually
0.56
transitioning
0.56
bravely
0.55
openly
0.55
sexuality
0.54
slowly
0.54
Activations Density 0.369%