INDEX
Explanations
sentences describing technical details and procedures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1535
+0.24
0.8%
2034
+0.19
0.6%
184
+0.18
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.24
0.07
1535
+0.19
0.06
184
+0.18
0.02
Negative Logits
Darum
-0.87
Zwar
-0.85
Endlich
-0.76
Tudo
-0.62
Daß
-0.61
Muitos
-0.61
Kdo
-0.59
万美元
-0.58
Hoje
-0.57
Enquanto
-0.56
POSITIVE LOGITS
lts
0.96
afp
0.89
Telex
0.85
fter
0.83
Quod
0.83
Juf
0.82
fte
0.81
lele
0.81
épis
0.80
scottish
0.80
Activations Density 0.144%