INDEX
Explanations
conjunctions connecting phrases or clauses
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
9
+0.10
0.3%
1842
+0.10
0.3%
674
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1265
+0.10
0.04
1892
+0.10
0.04
1806
+0.10
0.04
Negative Logits
prétend
-1.24
squa
-1.23
ftu
-1.22
soigne
-1.19
dispen
-1.19
Augu
-1.15
fta
-1.14
inev
-1.14
Keny
-1.13
secon
-1.13
POSITIVE LOGITS
whose
1.22
whose
1.00
which
0.94
who
0.90
whom
0.86
which
0.79
ซึ่ง
0.76
Whose
0.72
которые
0.72
where
0.71
Activations Density 0.257%