INDEX
Explanations
descriptions of science fiction scenarios involving robots and humanity
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
184
+0.18
0.5%
2015
+0.10
0.3%
1577
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
736
+0.18
0.06
163
+0.10
0.02
273
+0.09
0.03
Negative Logits
saar
-1.32
silikon
-1.27
makro
-1.26
kasa
-1.25
antik
-1.23
maksi
-1.23
teras
-1.22
keramik
-1.21
bandung
-1.19
seksi
-1.16
POSITIVE LOGITS
Vielleicht
0.85
Dziękuję
0.84
Bardzo
0.82
Genau
0.77
necesar
0.76
Vielen
0.76
Dzięki
0.75
Și
0.74
Obrigada
0.73
Finalmente
0.73
Activations Density 0.439%