INDEX
Explanations
phrases related to personal opinions or thoughts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1757
+0.19
0.7%
605
+0.15
0.5%
478
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1757
+0.19
0.06
1141
+0.15
0.04
976
+0.12
0.05
Negative Logits
maksi
-1.21
?...
-1.17
erik
-1.12
🤣🤣
-1.12
purcha
-1.07
reluct
-1.06
!...
-1.06
antik
-1.06
depic
-1.05
milf
-1.04
POSITIVE LOGITS
really
1.06
really
0.99
Really
0.91
Really
0.89
REALLY
0.89
wirklich
0.70
<bos>
0.69
realmente
0.65
truly
0.64
naprawdę
0.58
Activations Density 0.083%