INDEX
Explanations
verbs indicating past habitual actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
499
+0.10
0.3%
198
+0.09
0.3%
680
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
499
+0.10
0.04
865
+0.09
0.03
198
+0.09
0.03
Negative Logits
llons
-0.55
المعيارى
-0.52
dures
-0.49
nesc
-0.47
AYLOR
-0.46
-0.44
thè
-0.44
annique
-0.44
протягом
-0.44
ukunfts
-0.43
POSITIVE LOGITS
ftu
0.91
perfon
0.89
«<
0.89
juft
0.89
feen
0.87
»>
0.87
fta
0.86
ftre
0.86
fep
0.86
tranf
0.84
Activations Density 0.159%