INDEX
Explanations
phrases related to emotional intensity and reactions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.26
0.8%
184
+0.14
0.4%
332
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
284
+0.26
0.08
1363
+0.14
0.06
1013
+0.10
0.07
Negative Logits
increa
-3.58
effe
-3.51
inev
-3.50
fta
-3.40
ftu
-3.39
impra
-3.39
affor
-3.38
unden
-3.37
wien
-3.36
reluct
-3.34
POSITIVE LOGITS
<bos>
1.40
BoxFit
1.01
GOTREF
0.97
.
0.91
ClientRect
0.90
↵↵
0.89
…
0.88
HomeAsUpEnabled
0.87
0.86
WithFormat
0.86
Activations Density 0.744%