INDEX
Explanations
positive expressions referring to commitment, drive, and love
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.10
0.3%
1823
+0.10
0.3%
227
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.10
0.06
1823
+0.10
0.03
260
+0.09
0.03
Negative Logits
kasa
-1.50
umo
-1.48
lele
-1.45
jaya
-1.43
levis
-1.42
hina
-1.41
mef
-1.38
lyon
-1.38
kug
-1.37
makro
-1.36
POSITIVE LOGITS
<bos>
0.83
himself
0.74
always
0.64
proud
0.63
loves
0.63
feels
0.62
believes
0.62
enjoys
0.61
prefers
0.59
still
0.58
Activations Density 0.345%