INDEX
Explanations
first-person plural pronouns combined with verbs indicating action
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.21
0.7%
658
+0.15
0.5%
1376
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
658
+0.21
0.12
478
+0.15
0.09
862
+0.12
0.06
Negative Logits
<bos>
-0.62
arată
-0.60
offerts
-0.56
͜ʖ
-0.52
transfé
-0.52
jemanden
-0.49
écout
-0.49
BorderLayout
-0.48
devront
-0.48
الوطنيه
-0.48
POSITIVE LOGITS
alre
0.89
guarante
0.85
intersper
0.83
inappro
0.79
emphat
0.77
depic
0.76
reluct
0.75
disreg
0.74
encomp
0.74
desir
0.74
Activations Density 0.476%