INDEX
Explanations
sentences related to personal reflection and self-realization
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1967
+0.18
0.5%
1438
+0.13
0.4%
1577
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.18
0.03
1235
+0.13
0.06
1613
+0.12
0.05
Negative Logits
scrat
-1.18
milf
-1.16
increa
-1.13
maneu
-1.11
strick
-1.10
affor
-1.08
stickied
-1.07
secon
-1.07
shenan
-1.07
impra
-1.05
POSITIVE LOGITS
erráneo
0.48
ريقيا
0.46
अलावा
0.45
".
0.43
TintMode
0.43
cross
0.42
'.
0.42
”.
0.42
螂
0.41
makeText
0.41
Activations Density 0.953%