INDEX
Explanations
the word "to" in various contexts, with a significant preference for the preposition "to" used in different phrases and sentences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1334
+0.16
0.5%
674
+0.13
0.4%
1415
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1415
+0.16
0.06
1334
+0.13
0.07
411
+0.13
0.04
Negative Logits
thut
-1.31
fta
-1.31
Juf
-1.26
fup
-1.22
aen
-1.22
Intere
-1.21
Mémoires
-1.21
fays
-1.19
fte
-1.18
guarante
-1.17
POSITIVE LOGITS
<bos>
0.66
be
0.62
protect
0.61
avoid
0.61
achieve
0.60
improve
0.60
asts
0.59
establish
0.58
expand
0.58
minimize
0.58
Activations Density 0.235%