INDEX
Explanations
historical dates and events related to real-world locations or names
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
776
+0.12
0.4%
1013
+0.11
0.3%
1870
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
648
+0.12
0.05
499
+0.11
0.05
321
+0.10
0.05
Negative Logits
guarante
-1.30
affez
-1.26
effe
-1.23
fte
-1.18
desir
-1.16
volunte
-1.14
nece
-1.14
alre
-1.12
emphat
-1.11
thut
-1.11
POSITIVE LOGITS
році
0.56
년
0.55
зді
0.55
году
0.55
Çünkü
0.55
годах
0.54
年
0.54
year
0.54
hline
0.53
yılında
0.51
Activations Density 0.106%