INDEX
Explanations
phrases related to embracing or accepting something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1137
+0.10
0.3%
25
+0.10
0.3%
1512
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1892
+0.10
0.02
1512
+0.10
0.02
25
+0.10
0.02
Negative Logits
<bos>
-0.68
SizeMode
-0.57
otheby
-0.57
.*")]
-0.55
WriteHeader
-0.54
('.'-0.54
TextSpan
-0.52
ManyToOne
-0.52
("."-0.50
]^{--0.50
POSITIVE LOGITS
embra
1.27
strick
1.20
depic
1.19
snoopy
1.14
hentai
1.14
ftu
1.13
fta
1.13
thut
1.11
apprehen
1.10
reluct
1.09
Activations Density 0.076%